Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Machine learning for genetic prediction of schizophrenia

Smith, Matthew 2021. Machine learning for genetic prediction of schizophrenia. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2021SmithM PhD.pdf] PDF - Accepted Post-Print Version
Download (24MB)
[thumbnail of Cardiff University Electronic Publication Form] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (396kB)


The complexity of schizophrenia raises a formidable challenge. Its diverse genetic architecture, influence from environmental factors from the prenatal period through to adolescence, and the absence of a laboratory-based diagnostic test complicate efforts to "carve nature a its joints". Twinned with attempts to disentangle schizophrenia’s origins are those aiming to predict it. Prediction is essential to precision psychiatry and attempts to improve patient outcomes. Genetic prediction only became feasible relatively recently, following the discovery of robust risk loci in association studies. Polygenic risk scoring (PRS) is a popular method which relies on univariable tests of association and typically assumes additivity within and between loci, but explains only a small fraction of liability to schizophrenia. Machine learning (ML) methods have evolved out of the artificial intelligence and statistics communities which learn predictive patterns from labelled data. They are an enticing option in genetics, as they allow for multivariable predictive modelling, complex predictor relationships including interactions and can learn from datasets where the number of predictors exceeds observations. However,their predictive performance in schizophrenia is largely unknown. The ability of penalised logistic regression, support vector machines, random forests (RFs), gradient boosting machines (GBMs) and neural networks to predict schizophrenia from genetic data was investigated. A review systematically assessed predictive performance and methodology in machine learning on psychiatric disorders, finding poor reporting, widespread inadequate modelling approaches and high risk of bias. Simulations assessed performance in the presence of additive or interaction effects. Flexible ML approaches including RFs and GBMs performed best under interactions, but worse than PRS and sparse linear models for additive effects. Evaluation in real data assessed modelling procedures including calibration and deconfounding. Prediction was maximised when combining genetic and non-genetic factors; no evidence was found to support choosing machine learning approaches over logistic regression or PRS.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Medicine
Date of First Compliant Deposit: 3 September 2021
Last Modified: 03 Sep 2021 09:13

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics