Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Modelling multi-modal cross-interaction for multi-label few-shot image classification based on local feature selection

Yan, Kun, Bouraoui, Zied, Wei, Fangyun, Xu, Chang, Wang, Ping, Jameel, Shoaib and Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881 2025. Modelling multi-modal cross-interaction for multi-label few-shot image classification based on local feature selection. ACM Transactions on Multimedia Computing, Communications and Applications 10.1145/3711867

[thumbnail of TOMM_Manuscripts_Accepted_for_First_Look.pdf]
Preview
PDF - Accepted Post-Print Version
Download (3MB) | Preview

Abstract

The aim of multi-label few-shot image classification (ML-FSIC) is to assign semantic labels to images, in settings where only a small number of training examples are available for each label. A key feature of the multi-label setting is that an image often has several labels, which typically refer to objects appearing in different regions of the image. When estimating label prototypes, in a metric-based setting, it is thus important to determine which regions are relevant for which labels, but the limited amount of training data and the noisy nature of local features make this highly challenging. As a solution, we propose a strategy in which label prototypes are gradually refined. First, we initialize the prototypes using word embeddings, which allows us to leverage prior knowledge about the meaning of the labels. Second, taking advantage of these initial prototypes, we then use a Loss Change Measurement (LCM) strategy to select the local features from the training images (i.e. the support set) that are most likely to be representative of a given label. Third, we construct the final prototype of the label by aggregating these representative local features using a multi-modal cross-interaction mechanism, which again relies on the initial word embedding-based prototypes. Experiments on COCO, PASCAL VOC, NUS-WIDE, and iMaterialist show that our model substantially improves the current state-of-the-art.

Item Type: Article
Date Type: Published Online
Status: In Press
Schools: Computer Science & Informatics
Publisher: Association for Computing Machinery (ACM)
ISSN: 1551-6857
Date of First Compliant Deposit: 23 January 2025
Date of Acceptance: 10 December 2024
Last Modified: 23 Jan 2025 15:15
URI: https://orca.cardiff.ac.uk/id/eprint/175190

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics