Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Exploring the potential of ChatGPT-4o in thyroid nodule diagnosis using multi-modality ultrasound imaging: Dual- vs. triple-modality approaches

Chen, Ziman, Chambara, Nonhlanhla ORCID: https://orcid.org/0000-0002-3183-883X, Liu, Shirley Yuk Wah, Chow, Tom Chi Man, Lai, Carol Man Sze and Ying, Michael Tin Cheung 2025. Exploring the potential of ChatGPT-4o in thyroid nodule diagnosis using multi-modality ultrasound imaging: Dual- vs. triple-modality approaches. Cancers 17 (13) , 2068. 10.3390/cancers17132068

[thumbnail of cancers-17-02068.pdf] PDF - Published Version
Download (2MB)
License URL: https://creativecommons.org/licenses/by/4.0/
License Start date: 20 June 2025

Abstract

Background/Objectives Recent advancements in large language models, such as ChatGPT-4o, have created new opportunities for analyzing complex multi-modal data, including medical images. This study aims to assess the potential of ChatGPT-4o in distinguishing between benign and malignant thyroid nodules via multi-modality ultrasound imaging: grayscale ultrasound, color Doppler ultrasound (CDUS), and shear wave elastography (SWE). Materials and Methods Patients who underwent thyroid nodule ultrasound examinations and had confirmed pathological diagnoses were included. ChatGPT-4o analyzed the multi-modality ultrasound data using two approaches: (1.) a dual-modality strategy which employed grayscale ultrasound and CDUS, and (2.) a triple-modality strategy which incorporated grayscale ultrasound, CDUS, and SWE. The diagnostic performance was compared against pathological findings utilizing receiver operating characteristic (ROC) curve analysis, while consistency was evaluated through Cohen’s Kappa analysis. Results A total of 106 thyroid nodules were evaluated; 65.1% were benign and 34.9% malignant. In the dual-modality approach, ChatGPT-4o achieved an area under the ROC curve (AUC) of 66.3%, moderate agreement with pathology results (Kappa = 0.298), a sensitivity of 70.3%, a specificity of 62.3%, and an accuracy of 65.1%. Conversely, the triple-modality approach exhibited higher specificity at 97.1% but lower sensitivity at 18.9%, with an accuracy of 69.8% and a reduced overall agreement (Kappa = 0.194), resulting in an AUC of 58.0%. Conclusions ChatGPT-4o exhibits potential, to some extent, in classifying thyroid nodules using multi-modality ultrasound imaging. However, the dual-modality approach unexpectedly outperforms the triple-modality approach. This indicates that ChatGPT-4o might encounter challenges in integrating and prioritizing different data modalities, particularly when conflicting information is present, which could impact diagnostic effectiveness.

Item Type: Article
Date Type: Published Online
Status: Published
Schools: Schools > Healthcare Sciences
Additional Information: License information from Publisher: LICENSE 1: URL: https://creativecommons.org/licenses/by/4.0/, Start Date: 2025-06-20
Publisher: MDPI
Date of First Compliant Deposit: 8 July 2025
Date of Acceptance: 18 June 2025
Last Modified: 08 Jul 2025 09:00
URI: https://orca.cardiff.ac.uk/id/eprint/179620

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics