ZhongYu Li

Selected Publications (click to sort: first author / date)

* Equal contribution. # Corresponding author. Representative papers are highlighted.

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li*, Ruoyi Du*, Juncheng Yan, Le Zhuo, Zhen Li#, Peng Gao, Zhanyu Ma, Ming-Ming Cheng#.
ICCV, 2025

project / online demo / paper / github / bibtex

🔥🔥🔥 Support a wide range of in-domain tasks, and generalize to unseen tasks.

💥💥💥 VisualCloze has been merged into the official pipelines of diffusers. See Model Card for details.

Towards RAW Object Detection in Diverse Conditions
Zhong-Yu Li, Xin Jin, Boyuan Sun, Chun-Le Guo, Ming-Ming Cheng#.
CVPR, 2025, (Highlight)

paper / code / bibtex

Enhancing Representations through Heterogeneous Self-Supervised Learning
Zhong-Yu Li, Bo-Wen Yin, Yongxiang Liu, Li Liu, Ming-Ming Cheng#.
TPAMI, 2025
paper / code / bibtex

SERE: Exploring Feature Self-relation for Self-supervised Transformer
Zhong-Yu Li, Shanghua Gao#, Ming-Ming Cheng.
TPAMI, 2023
paper / code / bibtex

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Yunheng Li, Zhong-Yu Li, Quansheng Zeng, Qibin Hou#, Ming-Ming Cheng.
ICML, 2024
paper / code / bibtex

Large-scale Unsupervised Semantic Segmentation
Shanghua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng#, Junwei Han, Philip Torr.
TPAMI, 2023
paper / code / bibtex

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bo-Wen Yin, Xuying Zhang, Zhong-Yu Li, Li Liu, Ming-Ming Cheng, Qibin Hou#.
ICLR, 2024
paper / code / bibtex

RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Shanghua Gao, Zhong-Yu Li, Qi Han, Ming-Ming Cheng#, Liang Wang.
TPAMI, 2023
paper / code / bibtex

Global2local: Efficient structure search for video action segmentation
Shang-Hua Gao, Qi Han, Zhong-Yu Li, Pai Peng, Liang Wang, Ming-Ming Cheng#.
CVPR, 2021
paper / code / bibtex

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
Dongyang Liu*, Shicheng Li*, Yutong Liu*, Zhen Li*, Kai Wang*, Xinyue Li*, Qi Qin, Yufei Liu, Yi Xin, Zhong-Yu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li#, Peng Gao#
arxiv, 2025
paper / code / bibtex

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Jiayi Lei*, Renrui Zhang*, Xiangfei Hu, Weifeng Lin, Zhen Li, Wenjian Sun, Ruoyi Du, Le Zhuo, Zhong-Yu Li, Xinyue Li, Shitian Zhao, Ziyu Guo, Yiting Lu, Peng Gao#, Hongsheng Li#
arXiv, 2025
paper / code / bibtex

Experience

2024/12 - 2025/5
Shanghai

Research Intern

Working with Peng Gao and Zhen Li, where I developed video/image data pipeline and completed the project of VisualCloze, a universal image generation framework with visual in-context learning. It supports a wide range of in-domain tasks, and generalizes to unseen tasks.

2025/5 - 至今
Alibaba, Quark, Tongyi

Foundation Model Team, Multimodal Large Model Algorithm Intern

Pretraining for image-to-image foundation models (similar to GPT-4o-style image editing).

Building large-scale pipelines for image-language to image generation datasets.

Education

2021/09 - 2026/06, I am a Ph.D student at College of Computer Science, Nankai University, under the supervision of Prof. Ming-Ming Cheng.

2017/09 - 2021/06, I was an undergraduate student at College of Computer Science and Technology, Nankai University.

Contact

You are very welcome to contact me. I can be contacted directly at lizhongyu [at] mail.nankai.edu.cn.