Physically-guided open vocabulary segmentation with weighted patched alignment loss

Liu, Weide, Lou, Jieming, Wang, Xingxing, Zhou, Wei, Cheng, Jun and Yang, Xulei 2025. Physically-guided open vocabulary segmentation with weighted patched alignment loss. Neurocomputing 614 , 128788. 10.1016/j.neucom.2024.128788
Item availability restricted.

PDF - Accepted Post-Print Version
Restricted to Repository staff only until 28 October 2025 due to copyright restrictions.
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (2MB)

Official URL: https://doi.org/10.1016/j.neucom.2024.128788

Abstract

Open vocabulary segmentation is a challenging task that aims to segment out the thousands of unseen categories. Directly applying CLIP to open-vocabulary semantic segmentation is challenging due to the granularity gap between its image-level contrastive learning and the pixel-level recognition required for segmentation. To address these challenges, we propose a unified pipeline that leverages physical structure regularization to enhance the generalizability and robustness of open vocabulary segmentation. By incorporating physical structure information, which is independent of the training data, we aim to reduce bias and improve the model’s performance on unseen classes. We utilize low-level structures such as edges and keypoints as regularization terms, as they are easier to obtain and strongly correlated with segmentation boundary information. These structures are used as pseudo-ground truth to supervise the model. Furthermore, inspired by the effectiveness of comparative learning in human cognition, we introduce the weighted patched alignment loss. This loss function contrasts similar and dissimilar samples to acquire low-dimensional representations that capture the distinctions between different object classes. By incorporating physical knowledge and leveraging weighted patched alignment loss, we aim to improve the model’s generalizability, robustness, and capability to recognize diverse object classes. The experiments on the COCO Stuff, Pascal VOC, Pascal Context-59, Pascal Context-459, ADE20K-150, and ADE20K-847 datasets demonstrate that our proposed method consistently improves baselines and achieves new state-of-the-art in the open vocabulary segmentation task.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics
Publisher:	Elsevier
ISSN:	0925-2312
Date of First Compliant Deposit:	20 October 2024
Date of Acceptance:	19 October 2024
Last Modified:	07 Nov 2024 05:15
URI:	https://orca.cardiff.ac.uk/id/eprint/173144

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)