Yuan, Kunhao, Schaefer, Gerald, Lai, Yukun ![]() ![]() |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (6MB) | Preview |
Abstract
Video instance segmentation (VIS) is an evolving research topic in computer vision that aims to simultaneously detect, segment, and track semantic objects across multiple video frames. However, existing VIS methods are typically unaware of the reliability of the training samples from insufficient and imbalanced datasets, leading to suboptimal performance. To address this challenge, we propose a memory-based conditional neural process (MemCNP) module to exploit the strengths of both memory networks and the CNP model which handles heterogeneous latent space distributions for reliable modelling with insufficient data. Our MemCNP utilises predicted uncertainty to regularise VIS predictions as well as to identify reliable samples for effective training. Notably, our MemCNP is model-agnostic and can thus be seamlessly integrated into various VIS models to improve their performance. Extensive experiments on the YouTube-VIS and OVIS datasets demonstrate the effectiveness of MemCNP regardless of the underlying model architecture.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Schools > Computer Science & Informatics |
Publisher: | Elsevier |
ISSN: | 0925-2312 |
Date of First Compliant Deposit: | 27 September 2025 |
Date of Acceptance: | 30 August 2025 |
Last Modified: | 29 Sep 2025 09:45 |
URI: | https://orca.cardiff.ac.uk/id/eprint/181364 |
Actions (repository staff only)
![]() |
Edit Item |