Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Survival prediction in acute myeloid leukemia at distinct treatment time points: a performance comparison of random survival forest and Elastic-Net regularized Cox regression

Brady, Oisin Padraig, Fuentes Toro, Carolina ORCID: https://orcid.org/0000-0002-0871-939X, Johnson, Sean James, Giles, Peter ORCID: https://orcid.org/0000-0003-3143-6854, Alvares, Caroline ORCID: https://orcid.org/0000-0003-4391-9802 and Zabkiewicz, Joanna ORCID: https://orcid.org/0000-0003-0951-3825 2025. Survival prediction in acute myeloid leukemia at distinct treatment time points: a performance comparison of random survival forest and Elastic-Net regularized Cox regression. JMIR Bioinformatics and Biotechnology 10.2196/75678
Item availability restricted.

[thumbnail of preprint-75678-accepted.pdf] PDF - Accepted Post-Print Version
Restricted to Repository staff only

Download (3MB)
[thumbnail of Provisional file] PDF (Provisional file) - Accepted Post-Print Version
Download (17kB)

Abstract

Background: Risk group stratification based on AML patient survival prediction is complex. Despite common risk group categorisation guidelines, overall prognosis remains poor. Machine learning (ML) techniques have been shown to provide more accurate risk group stratification than conventional approaches using trial data. However, many time-to-event models do not utilize training sets constrained to specific time windows, instead using aggregations of trial data. Objective: Evaluate the performance of 1) Random Survival Forest (RSF) and 2) Cox Proportional Hazard Regression (CPHR) with Elastic Net regularisation (CoxNet) for survival prediction of Acute Myeloid Leukaemia patients within a censoring window trained with available data recorded at discreet time points during the AML17 randomised controlled trial dataset. Methods: For each stage in the AML17 trial, separate models were trained for each exhaustive k-choice combination of available AML17 data subsets. Data combinations for each model were further constrained according to the respective trial stage to avoid data leakage. Preliminary Pearson’s correlation methods were used to remove directly correlating features with the time-to-event prediction (time-to-death/5-year censoring point). Repeated k-fold stratified cross validation was used on each dataset ablation to find candidate models. Permutation importance and Elastic Net regularisation were used to monitor stability across validation folds and reduce the feature set of the highest performing stage RSF and CPHR models respectively. Finally, selected ablated models were re-evaluated using the nested, k-fold, stratified sampling cross validation method with bootstrapping. Results: Concordance index ranked the best models for data constricted up to the end of induction (RSF: 0.68, https://preprints.jmir.org/preprint/75678 [unpublished, peer-reviewed preprint] JMIR Preprints Brady et alCoxNet: 0.67), stages 1 (RSF: 0.69, CoxNet: 0.68), 2 (RSF: 0.68, CoxNet: 0.66), 3 (RSF: 0.69, CoxNet: 0.63) of the trial. Conclusion: This study details the high prediction accuracy for time-to-survival-event predictions when training sets of CoxNet and RSF models which are sequentially constricted to data measured up to the end of respective AML17 trial stages. Performance of these sequential time-to-event models intend to justify their use as part of a wider digital twin system simulating multiple time-to-event outcomes for AML patients.

Item Type: Article
Status: In Press
Schools: Schools > Medicine
Schools > Computer Science & Informatics
Publisher: JMIR Publications
ISSN: 2563-3570
Funders: EPSRC
Date of First Compliant Deposit: 15 January 2026
Date of Acceptance: 30 December 2025
Last Modified: 29 Jan 2026 11:46
URI: https://orca.cardiff.ac.uk/id/eprint/183705

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics