Nogueira, Keiller, Zaytar, Akram, Ma, Wanli, Roscher, Ribana, Hänsch, Ronny, Robinson, Caleb, Ortiz, Anthony, Nsutezo, Simone, Dodhia, Rahul, Lavista Ferres, Juan M., Karakus, Oktay ORCID: https://orcid.org/0000-0001-8009-9319 and Rosin, Paul L. ORCID: https://orcid.org/0000-0002-4965-3884
2026.
Core-set selection for data-efficient land cover segmentation.
IEEE Access
10.1109/access.2026.3659734
|
|
PDF
- Accepted Post-Print Version
Available under License Creative Commons Attribution. Download (12MB) |
Abstract
The increasing accessibility of remotely sensed data and their potential to support large-scale decision-making have driven the development of deep learning models for many Earth Observation tasks. Traditionally, such models rely on large datasets. However, the common assumption that larger training datasets lead to better performance tends to overlook issues related to data redundancy, noise, and the computational cost of processing massive datasets. Effective solutions must therefore consider not only the quantity but also the quality of data. Towards this, in this paper, we introduce six basic core-set selection approaches – that rely on imagery only, labels only, or a combination of both – and investigate whether they can identify high-quality subsets of data capable of maintaining – or even surpassing – the performance achieved when using full datasets for remote sensing semantic segmentation. We benchmark such approaches against two traditional baselines on three widely used land-cover classification datasets (DFC2022, Vaihingen, and Potsdam) using two different architectures (SegFormer and U-Net), thus establishing a general baseline for future works. Our experiments show that all proposed methods consistently outperform the baselines across multiple subset sizes, with some approaches even selecting core sets that surpass training on all available data. Notably, on DFC2022, a selected subset comprising only 25% of the training data yields slightly higher SegFormer performance than training with the entire dataset. This result shows the importance and potential of data-centric learning for the remote sensing domain. The code is available at https://github.com/keillernogueira/data-centric-rs-classification/.
| Item Type: | Article |
|---|---|
| Date Type: | Published Online |
| Status: | In Press |
| Schools: | Schools > Computer Science & Informatics |
| Publisher: | Institute of Electrical and Electronics Engineers |
| ISSN: | 2169-3536 |
| Date of First Compliant Deposit: | 9 February 2026 |
| Last Modified: | 09 Feb 2026 14:00 |
| URI: | https://orca.cardiff.ac.uk/id/eprint/184548 |
Actions (repository staff only)
![]() |
Edit Item |





Altmetric
Altmetric