Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

MiCEval: Unveiling multimodal chain of thought's quality via image description and reasoning steps

Zhou, Xiongtao, He, Jie, Chen, Lanyu, Li, Jingyu, Chen, Haojing, Gutierrez Basulto, Victor ORCID: https://orcid.org/0000-0002-6117-5459, Pan, Jeff Z. and Chen, Hanjie 2025. MiCEval: Unveiling multimodal chain of thought's quality via image description and reasoning steps. Presented at: 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Albuquerque, New Mexico, USA, 29 April - 4 May 2025. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Albuquerque, New Mexico: ACL, pp. 10002-10039. 10.18653/v1/2025.naacl-long.504

[thumbnail of 2025.naacl-long.504.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (4MB) | Preview

Abstract

**Multimodal Chain of Thought (MCoT)** is a popular prompting strategy for improving the performance of multimodal large language models (MLLMs) across a range of complex reasoning tasks. Despite its popularity, there is a notable absence of automated methods for evaluating the quality of reasoning steps in MCoT. To address this gap, we propose **Multimodal Chain-of-Thought Evaluation (MiCEval)**, a framework designed to assess the correctness of reasoning chains by evaluating the quality of both the description and each reasoning step. The evaluation of the description component focuses on the accuracy of the image descriptions, while the reasoning step evaluates the quality of each step as it is conditionally generated based on the preceding steps. MiCEval is built upon a fine-grained dataset with annotations that rate each step according to correctness, relevance, and informativeness. Extensive experiments on four state-of-the-art MLLMs show that step-wise evaluations using MiCEval align more closely with human judgments compared to existing methods based on cosine similarity or fine-tuning approaches. MiCEval datasets and code can be found at: [https://anonymous_github/MicEval](https://anonymous.4open.science/r/MiCEval-847F/README.md).

Item Type: Conference or Workshop Item - published (Paper)
Date Type: Publication
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: ACL
ISBN: 979-8-89176-189-6
Related URLs:
Date of First Compliant Deposit: 12 February 2025
Date of Acceptance: 22 January 2025
Last Modified: 27 Jan 2026 09:57
URI: https://orca.cardiff.ac.uk/id/eprint/176138

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics