Zhao, Ganlong, Li, Guanbin, Qin, Yipeng ORCID: https://orcid.org/0000-0002-1551-9126, Zhang, Jinjin, Chai, Zhenhua, Wei, Xiaolin, Lin, Liang and Yu, Yizhou
2024.
Exploration and exploitation of unlabeled data for open-set semi-supervised learning.
International Journal of Computer Vision
10.1007/s11263-024-02155-y
Item availability restricted. |
PDF
- Accepted Post-Print Version
Restricted to Repository staff only until 8 July 2025 due to copyright restrictions. Download (1MB) |
Abstract
In this paper, we address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples. Unlike previous methods that only consider ID samples to be useful and aim to filter out OOD ones completely during training, we argue that the exploration and exploitation of both ID and OOD samples can benefit SSL. To support our claim, (i) we propose a prototype-based clustering and identification algorithm that explores the inherent similarity and difference among samples at feature level and effectively cluster them around several predefined ID and OOD prototypes, thereby enhancing feature learning and facilitating ID/OOD identification; (ii) we propose an importance-based sampling method that exploits the difference in importance of each ID and OOD sample to SSL, thereby reducing the sampling bias and improving the training. Our proposed method achieves state-of-the-art in several challenging benchmarks, and improves upon existing SSL methods even when ID samples are totally absent in unlabeled data.
Item Type: | Article |
---|---|
Date Type: | Published Online |
Status: | In Press |
Schools: | Computer Science & Informatics |
Publisher: | Springer |
ISSN: | 0920-5691 |
Date of First Compliant Deposit: | 18 July 2024 |
Date of Acceptance: | 17 June 2024 |
Last Modified: | 18 Jul 2024 10:46 |
URI: | https://orca.cardiff.ac.uk/id/eprint/170425 |
Actions (repository staff only)
Edit Item |