| Zhang, Lingli, Li, Tianrui, Lu, Baiyu, Fang, Junlin, Zheng, Desheng, Zhou, Wei, Liu, Weide and Lv, Fengmao 2026. Rethinking the effect of unimodal labels in multimodal sentiment analysis. ACM Transactions on Multimedia Computing, Communications, and Applications , 3796718. 10.1145/3796718 |
Abstract
Multimodal sentiment analysis aims to comprehensively understand human sentiment by integrating diverse modalities, such as text, audio, and vision. To improve modality complementarity, the recent Multimodal Multi-task Learning (MML) framework employs joint training of unimodal and multimodal sentiment analysis tasks using sub-annotations of modality. In this work, we further draw attention to the observation that integrating unimodal tasks may introduce conflicting task information, negatively affecting the multimodal task performance. Motivated by this issue, we propose the Multimodal Task Correlation-aware Learning (MTCL) framework to leverage beneficial task correlations and suppress harmful ones. Specifically, MTCL introduces a Correlation-Adaptive Training (CAT) strategy to learn a task-relation aware unimodal encoder for each modality. First, in order to distinguish whether a sample contains conflicting information, CAT strategy incorporates a Dual-Branch Contrast (DBC) module which divides the training set into a beneficial subset and a harmful subset. Based on this division, CAT strategy proposes an adaptive training loss to guide the model in understanding nuanced multitask correlations. The adaptive training loss has two components: 1) For the beneficial subset, a contrastive loss is utilized to improve the model's ability to extract complementary representations. 2) For the harmful subset, we apply a task-correction loss to mitigate the negative interference caused by harmful task associations. With the CAT strategy, our framework can effectively distinguish beneficial and harmful task correlations to extract distinctive and robust unimodal representations. The superiority of MTCL is verified via extensive experiments on several multimodal video sentiment analysis benchmarks. Our work is publicly available at https://github.com/tiggers23/MTCL.
| Item Type: | Article |
|---|---|
| Date Type: | Published Online |
| Status: | In Press |
| Schools: | Schools > Computer Science & Informatics |
| Publisher: | Association for Computing Machinery (ACM) |
| ISSN: | 1551-6857 |
| Date of Acceptance: | 3 January 2026 |
| Last Modified: | 23 Mar 2026 12:31 |
| URI: | https://orca.cardiff.ac.uk/id/eprint/185959 |
Actions (repository staff only)
![]() |
Edit Item |




Altmetric
Altmetric