Roberts, Rebecca, Sanyaolu, Leigh ORCID: https://orcid.org/0000-0002-6762-6986, Sam, Christina, Farewell, Daniel ORCID: https://orcid.org/0000-0002-8871-1653, Edwards, Adrian ORCID: https://orcid.org/0000-0002-6228-4446 and Davies, Rhodri H.
2025.
Reproducibility of echocardiographic measurements of left ventricular systolic function: a systematic review and meta-analysis comparing artificial intelligence and clinician estimates.
European Heart Journal – Digital Health
, ztaf145.
10.1093/ehjdh/ztaf145
|
|
PDF
- Accepted Post-Print Version
Available under License Creative Commons Attribution. Download (1MB) |
Abstract
Background Echocardiography underpins diagnosis and management of cardiovascular disease, yet measurement variability can influence treatment decisions. Artificial Intelligence (AI) may standardise interpretation, but its reproducibility and clinical impact require systematic evaluation. Objective To compare the reproducibility of AI-derived and clinician-derived measurements of left ventricular (LV) systolic function, specifically Global Longitudinal Strain (GLS) and Ejection Fraction (EF), in adults. Methods We searched Medline, Embase, Web of Science and CENTRAL from inception to May 2025 for peer-reviewed studies assessing reproducibility of AI-derived EF and/or GLS from two-dimensional (2D) or three-dimensional (3D) transthoracic echocardiography. Reporting quality was assessed with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM). Random-effects meta-analyses of Intraclass Correlation Coefficients (ICCs) and Bland-Altman plots compared reproducibility of AI- and clinician-derived measures. Results Nineteen studies (17,984 participants; mean age 59 ± 8 years, 52.8% male) were included. Mean CLAIM adherence was 72.9%. Pooled ICCs demonstrated high reproducibility for both AI- and clinician-derived EF and GLS. Bland-Altman analyses showed limits of agreement of –13.4% to +12.7% for 2D EF and –4.3% to +2.3% for 2D GLS. 3D EF was slightly better, showing pooled limits of agreement of 11.26 to 12.61%. The pooled mean absolute differences (MAD) were 5.17% for 2D EF, 5.27% for 3D EF and 1.32% for 2D GLS. Conclusion AI-derived GLS and 3D EF achieve reproducibility comparable to, or exceeding, clinicians’ estimates. However, limits of agreement between clinician and AI estimates are sufficiently wide that reclassification is possible around key thresholds which could affect patient management decisions. Large-scale, real-world validation remains essential to confirm generalisability.
| Item Type: | Article |
|---|---|
| Date Type: | Published Online |
| Status: | In Press |
| Schools: | Schools > Medicine |
| Publisher: | Oxford University Press |
| Date of First Compliant Deposit: | 16 December 2025 |
| Date of Acceptance: | 12 November 2025 |
| Last Modified: | 16 Dec 2025 15:30 |
| URI: | https://orca.cardiff.ac.uk/id/eprint/183294 |
Actions (repository staff only)
![]() |
Edit Item |





Dimensions
Dimensions