Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Multi-level mixture of experts for multimodal entity linking

Hu, Zhiwei, Gutierrez Basulto, Victor ORCID: https://orcid.org/0000-0002-6117-5459, Xiang, Zhiliang ORCID: https://orcid.org/0000-0002-0263-7289, Li, Ru and Pan, Jeff Z. 2025. Multi-level mixture of experts for multimodal entity linking. Presented at: 31st SIGKDD Conference on Knowledge Discovery and Data Mining, Toronto, Canada, 3-7 August 2025. Published in: Antonie, A., Pei, J., Yu, X., Chierichetti, F., Lauw, H. W., Sun, Y. and Parthasarathy, S. eds. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2. New York, NY, USA: Association for Computing Machinery, pp. 979-990. 10.1145/3711896.3737060

[thumbnail of 3711896.3737060.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Multimodal Entity Linking (MEL) aims to link ambiguous mentions within multimodal contexts to associated entities in a multimodal knowledge base. Existing approaches to MEL introduce multimodal interaction and fusion mechanisms to bridge the modality gap and enable multi-grained semantic matching. However, they do not address two important problems: (i) mention ambiguity, i.e., the lack of semantic content caused by the brevity and omission of key information in the mention's textual context; (ii) dynamic selection of modal content, i.e., to dynamically distinguish the importance of different parts of modal information. To mitigate these issues, we propose a Multi-level Mixture of Experts (MMoE) model for MEL. MMoE has four components: (i) the description-aware mention enhancement module leverages large language models to identify the WikiData descriptions that best match a mention, considering the mention's textual context; (ii) the multimodal feature extraction module adopts multimodal feature encoders to obtain textual and visual embeddings for both mentions and entities; (iii)-(iv) the intra-level mixture of experts and inter-level mixture of experts modules apply a switch mixture of experts mechanism to dynamically and adaptively select features from relevant regions of information. Extensive experiments on WikiMEL, RichpediaMEL and WikiDiverse datasets demonstrate the outstanding performance of MMoE compared to the state-of-the-art. MMoE's code is available at: https://github.com/zhiweihu1103/MEL-MMoE.

Item Type: Conference or Workshop Item (Paper)
Date Type: Published Online
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: Association for Computing Machinery
ISBN: 9798400714542
Related URLs:
Date of First Compliant Deposit: 3 June 2025
Date of Acceptance: 16 May 2025
Last Modified: 28 Aug 2025 12:31
URI: https://orca.cardiff.ac.uk/id/eprint/178724

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics