Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Automated development of clinical prediction models using genetic programming

Bannister, Christian ORCID: 2015. Automated development of clinical prediction models using genetic programming. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2016bannistercaphd.pdf]
PDF - Accepted Post-Print Version
Download (11MB) | Preview
[thumbnail of bannisterca.pdf] PDF - Supplemental Material
Restricted to Repository staff only

Download (977kB)


Genetic programming is an Evolutionary Computing technique, inspired by biological evolution, capable of discovering complex non-linear patterns in large datasets. Genetic programming is a general methodology, the specific implementation of which requires development of several different specific elements such as problem representation, fitness, selection and genetic variation. Despite the potential advantages of genetic programming over standard statistical methods, its applications to survival analysis are at best rare, primarily because of the difficulty in handling censored data. The aim of this work was to develop a genetic programming approach for survival analysis and demonstrate its utility for the automatic development of clinical prediction models using cardiovascular disease as a case study. We developed a tree-based untyped steady-state genetic programming approach for censored longitudinal data, comparing its performance to the de facto statistical method—Cox regression—in the development of clinical prediction models for the prediction of future cardiovascular events in patients with symptomatic and asymptomatic cardiovascular disease, using large observational datasets. We also used genetic programming to examine the prognostic significance of different risk factors together with their non-linear combinations for the prognosis of health outcomes in cardiovascular disease. These experiments showed that Cox regression and the developed steady-state genetic programming approach produced similar results when evaluated in common validation datasets. Despite slight relative differences, both approaches demonstrated an acceptable level of discriminative and calibration at a range of times points. Whilst the application of genetic programming did not provide more accurate representations of factors that predict the risk of both symptomatic and asymptomatic cardiovascular disease when compared with existing methods, genetic programming did offer comparable performance. Despite generally comparable performance, albeit in slight favour of the Cox model, the predictors selected for representing their relationships with the outcome were quite different and, on average, the models developed using genetic programming used considerably fewer predictors. The results of the genetic programming confirm the prognostic significance of a small number of the most highly associated predictors in the Cox modelling; age, previous atherosclerosis, and albumin for secondary prevention; age, recorded diagnosis of ’other’ cardiovascular disease, and ethnicity for primary prevention in patients with type 2 diabetes. When considered as a whole, genetic programming did not produce better performing clinical prediction models, rather it utilised fewer predictors, most of which were the predictors that Cox regression estimated be most strongly associated with the outcome, whilst achieving comparable performance. This suggests that genetic programming may better represent the potentially non-linear relationship of (a smaller subset of) the strongest predictors. To our knowledge, this work is the first study to develop a genetic programming approach for censored longitudinal data and assess its value for clinical prediction in comparison with the well-known and widely applied Cox regression technique. Using empirical data this work has demonstrated that clinical prediction models developed by steady-state genetic programming have predictive ability comparable to those developed using Cox regression. The genetic programming models were more complex and thus more difficult to validate by domain experts, however these models were developed in an automated fashion, using fewer input variables, without the need for domain specific knowledge and expertise required to appropriately perform survival analysis. This work has demonstrated the strong potential of genetic programming as a methodology for automated development of clinical prediction models for diagnostic and prognostic purposes in the presence of censored data. This work compared untuned genetic programming models that were developed in an automated fashion with highly tuned Cox regression models that was developed in a very involved manner that required a certain amount of clinical and statistical expertise. Whilst the highly tuned Cox regression models performed slightly better in validation data, the performance of the automatically generated genetic programming models were generally comparable. The comparable performance demonstrates the utility of genetic programming for clinical prediction modelling and prognostic research, where the primary goal is accurate prediction. In aetiological research, where the primary goal is to examine the relative strength of association between risk factors and the outcome, then Cox regression and its variants remain as the de facto approach.

Item Type: Thesis (PhD)
Date Type: Publication
Status: Unpublished
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Date of First Compliant Deposit: 12 May 2016
Last Modified: 19 May 2023 01:19

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics