Large multimodal models evaluation: a survey

Zhang, Zicheng, Wang, Junying, Wen, Farong, Guo, Yijin, Zhao, Xiangyu, Fang, Xinyu, Ding, Shengyuan, Jia, Ziheng, Xiao, Jiahao, Shen, Ye, Zheng, Yushuo, Zhu, Xiaorong, Wu, Yalun, Jiao, Ziheng, Sun, Wei, Chen, Zijian, Zhang, Kaiwei, Fu, Kang, Cao, Yuqin, Hu, Ming, Zhou, Yue, Zhou, Xuemei, Cao, Juntai, Zhou, Wei, Cao, Jinyu, Li, Ronghui, Zhou, Donghao, Tian, Yuan, Zhu, Xiangyang, Li, Chunyi, Wu, Haoning, Liu, Xiaohong, He, Junjun, Zhou, Yu, Liu, Hui, Zhang, Lin, Wang, Zesheng, Duan, Huiyu, Zhou, Yingjie, Min, Xiongkuo, Jia, Qi, Zhou, Dongzhan, Zhang, Wenlong, Cao, Jiezhang, Yang, Xue, Yu, Junzhi, Zhang, Songyang, Duan, Haodong and Zhai, Guangtao 2025. Large multimodal models evaluation: a survey. SCIENCE CHINA Information Sciences 68 , 221301. 10.1007/s11432-025-4676-4

Full text not available from this repository.

Official URL: https://doi.org/10.1007/s11432-025-4676-4

Abstract

As large multimodal models (LMMs) advance rapidly across diverse multimodal understanding and generation tasks, the need for systematic and reliable evaluation frameworks becomes increasingly critical. To address this need, this survey provides a structured overview of LMM evaluation, centered around two main axes: multimodal evaluation for understanding and generation. (1) For understanding, a dual-perspective framework is introduced to distinguish benchmarks between general capabilities, which emphasize common tasks, and specialized capabilities, which reflect expert-level competence in domain-specific fields. (2) For generation, evaluation is organized by output modality, including image, video, audio, and 3D content. (3) From a community perspective, this survey further highlights authoritative leaderboards and foundational tools that have been instrumental in establishing a comprehensive evaluation ecosystem for LMMs. By unifying general-specialized understanding and modality-specific generation evaluations, this survey clarifies the current landscape and provides guidance for future research in the LMM evaluation field.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics
Publisher:	Springer
ISSN:	1674-733X
Date of Acceptance:	8 November 2025
Last Modified:	01 Dec 2025 14:30
URI:	https://orca.cardiff.ac.uk/id/eprint/182774

Actions (repository staff only)

Edit Item

Altmetric

Dimensions

CORE (COnnecting REpositories)