Zhong-Yu Li

PhD student, Nankai University

Zhong-Yu Li is a PhD student at Nankai University, advised by Prof. Ming-Ming Cheng. He is expected to graduate in 2026. His research interests include computer vision and deep learning, especially focusing on AIGC, visual generation and representative learning.

Selected Publications (click to sort: first author / date)

* Equal contribution. # Corresponding author. Representative papers are highlighted.

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li*, Ruoyi Du*, Juncheng Yan, Le Zhuo, Zhen Li#, Peng Gao, Zhanyu Ma, Ming-Ming Cheng#.
ICCV, 2025  
project / online demo / paper / github / bibtex

🔥🔥🔥 Support a wide range of in-domain tasks, and generalize to unseen tasks.

💥💥💥 VisualCloze has been merged into the official pipelines of diffusers. See Model Card for details.

Towards RAW Object Detection in Diverse Conditions
Zhong-Yu Li, Xin Jin, Boyuan Sun, Chun-Le Guo, Ming-Ming Cheng#.
CVPR, 2025, (Highlight)  
paper / code / bibtex
Enhancing Representations through Heterogeneous Self-Supervised Learning
Zhong-Yu Li, Bo-Wen Yin, Yongxiang Liu, Li Liu, Ming-Ming Cheng#.
TPAMI, 2025
paper / code / bibtex
SERE: Exploring Feature Self-relation for Self-supervised Transformer
Zhong-Yu Li, Shanghua Gao#, Ming-Ming Cheng.
TPAMI, 2023
paper / code / bibtex
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Yunheng Li, Zhong-Yu Li, Quansheng Zeng, Qibin Hou#, Ming-Ming Cheng.
ICML, 2024
paper / code / bibtex
Large-scale Unsupervised Semantic Segmentation
Shanghua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng#, Junwei Han, Philip Torr.
TPAMI, 2023
paper / code / bibtex
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bo-Wen Yin, Xuying Zhang, Zhong-Yu Li, Li Liu, Ming-Ming Cheng, Qibin Hou#.
ICLR, 2024
paper / code / bibtex
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Shanghua Gao, Zhong-Yu Li, Qi Han, Ming-Ming Cheng#, Liang Wang.
TPAMI, 2023
paper / code / bibtex
Global2local: Efficient structure search for video action segmentation
Shang-Hua Gao, Qi Han, Zhong-Yu Li, Pai Peng, Liang Wang, Ming-Ming Cheng#.
CVPR, 2021
paper / code / bibtex
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
Dongyang Liu*, Shicheng Li*, Yutong Liu*, Zhen Li*, Kai Wang*, Xinyue Li*, Qi Qin, Yufei Liu, Yi Xin, Zhong-Yu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li#, Peng Gao#
arxiv, 2025
paper / code / bibtex
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Jiayi Lei*, Renrui Zhang*, Xiangfei Hu, Weifeng Lin, Zhen Li, Wenjian Sun, Ruoyi Du, Le Zhuo, Zhong-Yu Li, Xinyue Li, Shitian Zhao, Ziyu Guo, Yiting Lu, Peng Gao#, Hongsheng Li#
arXiv, 2025
paper / code / bibtex

Experience

PJLab
2024/12 - 2025/5
Shanghai

Research Intern

Working with Peng Gao and Zhen Li, where I developed video/image data pipeline and completed the project of VisualCloze, a universal image generation framework with visual in-context learning. It supports a wide range of in-domain tasks, and generalizes to unseen tasks.

Alibaba
2025/5 - 至今
Alibaba, Quark

Foundation Model Team, Multimodal Large Model Algorithm Intern

Pretraining for image-to-image foundation models (similar to GPT-4o-style image editing).

Building large-scale pipelines for image-language to image generation datasets.

Education

2021/09 - 2026/06, I am a Ph.D student at College of Computer Science, Nankai University, under the supervision of Prof. Ming-Ming Cheng.

2017/09 - 2021/06, I was an undergraduate student at College of Computer Science and Technology, Nankai University.

Contact

You are very welcome to contact me. I can be contacted directly at lizhongyu [at] mail.nankai.edu.cn.