Liu, Weide, Lou, Jieming, Wang, Xingxing, Zhou, Wei, Cheng, Jun and Yang, Xulei
2025.
Physically-guided open vocabulary segmentation with weighted patched alignment loss.
Neurocomputing
614
, 128788.
10.1016/j.neucom.2024.128788
Item availability restricted. |
PDF
- Accepted Post-Print Version
Restricted to Repository staff only until 28 October 2025 due to copyright restrictions. Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (2MB) |
Abstract
Open vocabulary segmentation is a challenging task that aims to segment out the thousands of unseen categories. Directly applying CLIP to open-vocabulary semantic segmentation is challenging due to the granularity gap between its image-level contrastive learning and the pixel-level recognition required for segmentation. To address these challenges, we propose a unified pipeline that leverages physical structure regularization to enhance the generalizability and robustness of open vocabulary segmentation. By incorporating physical structure information, which is independent of the training data, we aim to reduce bias and improve the model’s performance on unseen classes. We utilize low-level structures such as edges and keypoints as regularization terms, as they are easier to obtain and strongly correlated with segmentation boundary information. These structures are used as pseudo-ground truth to supervise the model. Furthermore, inspired by the effectiveness of comparative learning in human cognition, we introduce the weighted patched alignment loss. This loss function contrasts similar and dissimilar samples to acquire low-dimensional representations that capture the distinctions between different object classes. By incorporating physical knowledge and leveraging weighted patched alignment loss, we aim to improve the model’s generalizability, robustness, and capability to recognize diverse object classes. The experiments on the COCO Stuff, Pascal VOC, Pascal Context-59, Pascal Context-459, ADE20K-150, and ADE20K-847 datasets demonstrate that our proposed method consistently improves baselines and achieves new state-of-the-art in the open vocabulary segmentation task.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Publisher: | Elsevier |
ISSN: | 0925-2312 |
Date of First Compliant Deposit: | 20 October 2024 |
Date of Acceptance: | 19 October 2024 |
Last Modified: | 07 Nov 2024 05:15 |
URI: | https://orca.cardiff.ac.uk/id/eprint/173144 |
Actions (repository staff only)
Edit Item |