Reproducibility of echocardiographic measurements of left ventricular systolic function: a systematic review and meta-analysis comparing artificial intelligence and clinician estimates

Roberts, Rebecca, Sanyaolu, Leigh

, Sam, Christina, Farewell, Daniel

, Edwards, Adrian

and Davies, Rhodri H. 2025. Reproducibility of echocardiographic measurements of left ventricular systolic function: a systematic review and meta-analysis comparing artificial intelligence and clinician estimates. European Heart Journal – Digital Health , ztaf145. 10.1093/ehjdh/ztaf145

PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution.
Download (1MB)

Official URL: https://doi.org/10.1093/ehjdh/ztaf145

Abstract

Background Echocardiography underpins diagnosis and management of cardiovascular disease, yet measurement variability can influence treatment decisions. Artificial Intelligence (AI) may standardise interpretation, but its reproducibility and clinical impact require systematic evaluation. Objective To compare the reproducibility of AI-derived and clinician-derived measurements of left ventricular (LV) systolic function, specifically Global Longitudinal Strain (GLS) and Ejection Fraction (EF), in adults. Methods We searched Medline, Embase, Web of Science and CENTRAL from inception to May 2025 for peer-reviewed studies assessing reproducibility of AI-derived EF and/or GLS from two-dimensional (2D) or three-dimensional (3D) transthoracic echocardiography. Reporting quality was assessed with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM). Random-effects meta-analyses of Intraclass Correlation Coefficients (ICCs) and Bland-Altman plots compared reproducibility of AI- and clinician-derived measures. Results Nineteen studies (17,984 participants; mean age 59 ± 8 years, 52.8% male) were included. Mean CLAIM adherence was 72.9%. Pooled ICCs demonstrated high reproducibility for both AI- and clinician-derived EF and GLS. Bland-Altman analyses showed limits of agreement of –13.4% to +12.7% for 2D EF and –4.3% to +2.3% for 2D GLS. 3D EF was slightly better, showing pooled limits of agreement of 11.26 to 12.61%. The pooled mean absolute differences (MAD) were 5.17% for 2D EF, 5.27% for 3D EF and 1.32% for 2D GLS. Conclusion AI-derived GLS and 3D EF achieve reproducibility comparable to, or exceeding, clinicians’ estimates. However, limits of agreement between clinician and AI estimates are sufficiently wide that reclassification is possible around key thresholds which could affect patient management decisions. Large-scale, real-world validation remains essential to confirm generalisability.

Item Type:	Article
Date Type:	Published Online
Status:	In Press
Schools:	Schools > Medicine
Publisher:	Oxford University Press
Date of First Compliant Deposit:	16 December 2025
Date of Acceptance:	12 November 2025
Last Modified:	16 Dec 2025 15:30
URI:	https://orca.cardiff.ac.uk/id/eprint/183294

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)