ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Uncovering and mitigating transient blindness in multimodal model editing

Han, Xiaoqi, Li, Ru, Yi, Ran, Tan, Hongye, Liang, Zhuomin, Gutierrez Basulto, Victor

and Pan, Jeff Z. 2025. Uncovering and mitigating transient blindness in multimodal model editing. Presented at: 40th Annual AAAI Conference on Artificial Intelligence (AAAI'26), Singapore, 20-27 January 2026.

Preview

PDF - Presentation
Download (1MB) | Preview

Abstract

Multimodal Model Editing (MMED) aims to correct erroneous knowledge in multimodal models. Existing evaluation methods, adapted from textual model editing, overstate success by relying on low-similarity or random inputs, obscure overfitting. We propose a comprehensive locality evaluation framework, covering three key dimensions: random-image locality, no-image locality, and consistent-image locality, operationalized through seven distinct data types, enabling a detailed and structured analysis of multimodal edits. We introduce De-VQA, a dynamic evaluation for visual question answering, uncovering a phenomenon we term transient blindness, overfitting to edit similar text while ignoring visuals. Token analysis shows edits disproportionately affect textual tokens. We propose locality-aware adversarial losses to balance cross-modal representations. Empirical results demonstrate that our approach consistently outperforms existing baselines, reducing transient blindness and improving locality by 17% on average.

Item Type:	Conference or Workshop Item (Paper)
Status:	Unpublished
Schools:	Schools > Computer Science & Informatics
Related URLs:	https://aaai.org/conference/aaai/aaai-26...
Date of First Compliant Deposit:	27 November 2025
Date of Acceptance:	7 November 2025
Last Modified:	28 Nov 2025 16:00
URI:	https://orca.cardiff.ac.uk/id/eprint/182717

Actions (repository staff only)

Edit Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)