Explore In-Context Segmentation via Latent Diffusion Models

Wang, Chaoyang; Li, Xiangtai; Ding, Henghui; Qi, Lu; Zhang, Jiangning; Tong, Yunhai; Loy, Chen Change; Yan, Shuicheng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.09616 (cs)

[Submitted on 14 Mar 2024 (v1), last revised 9 Mar 2025 (this version, v2)]

Title:Explore In-Context Segmentation via Latent Diffusion Models

Authors:Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan

View PDF HTML (experimental)

Abstract:In-context segmentation has drawn increasing attention with the advent of vision foundation models. Its goal is to segment objects using given reference images. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. This work approaches the problem from a fresh perspective - unlocking the capability of the latent diffusion model (LDM) for in-context segmentation and investigating different design choices. Specifically, we examine the problem from three angles: instruction extraction, output alignment, and meta-architectures. We design a two-stage masking strategy to prevent interfering information from leaking into the instructions. In addition, we propose an augmented pseudo-masking target to ensure the model predicts without forgetting the original images. Moreover, we build a new and fair in-context segmentation benchmark that covers both image and video datasets. Experiments validate the effectiveness of our approach, demonstrating comparable or even stronger results than previous specialist or visual foundation models. We hope our work inspires others to rethink the unification of segmentation and generation.

Comments:	AAAI 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.09616 [cs.CV]
	(or arXiv:2403.09616v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.09616

Submission history

From: Chaoyang Wang [view email]
[v1] Thu, 14 Mar 2024 17:52:31 UTC (4,568 KB)
[v2] Sun, 9 Mar 2025 11:58:01 UTC (2,839 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Explore In-Context Segmentation via Latent Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Explore In-Context Segmentation via Latent Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators