Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Training data distribution significantly impacts the estimation of tissue microstructure with machine learning

Gyori, N.G., Palombo, M. ORCID: https://orcid.org/0000-0003-4892-7967, Clark, C.A., Zhang, H. and Alexander, D.C. 2022. Training data distribution significantly impacts the estimation of tissue microstructure with machine learning. Magnetic Resonance in Medicine 87 (2) , pp. 932-947. 10.1002/mrm.29014

[thumbnail of Magnetic Resonance in Med - 2021 - Gyori - Training data distribution significantly impacts the estimation of tissue.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (2MB)

Abstract

Purpose Supervised machine learning (ML) provides a compelling alternative to traditional model fitting for parameter mapping in quantitative MRI. The aim of this work is to demonstrate and quantify the effect of different training data distributions on the accuracy and precision of parameter estimates when supervised ML is used for fitting. Methods We fit a two- and three-compartment biophysical model to diffusion measurements from in-vivo human brain, as well as simulated diffusion data, using both traditional model fitting and supervised ML. For supervised ML, we train several artificial neural networks, as well as random forest regressors, on different distributions of ground truth parameters. We compare the accuracy and precision of parameter estimates obtained from the different estimation approaches using synthetic test data. Results When the distribution of parameter combinations in the training set matches those observed in healthy human data sets, we observe high precision, but inaccurate estimates for atypical parameter combinations. In contrast, when training data is sampled uniformly from the entire plausible parameter space, estimates tend to be more accurate for atypical parameter combinations but may have lower precision for typical parameter combinations. Conclusion This work highlights that estimation of model parameters using supervised ML depends strongly on the training-set distribution. We show that high precision obtained using ML may mask strong bias, and visual assessment of the parameter maps is not sufficient for evaluating the quality of the estimates.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Psychology
Additional Information: This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Publisher: Wiley
ISSN: 0740-3194
Date of First Compliant Deposit: 2 March 2022
Date of Acceptance: 30 August 2021
Last Modified: 12 May 2023 10:28
URI: https://orca.cardiff.ac.uk/id/eprint/147875

Citation Data

Cited 8 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics