Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Leveraging electronic health record data to inform hospital resource management: A systematic data mining approach

Ferrão, José Carlos, Oliveira, Mónica Duarte, Gartner, Daniel, Janela, Filipe and Martins, Henrique 2021. Leveraging electronic health record data to inform hospital resource management: A systematic data mining approach. Health Care Management Science 24 , pp. 716-741. 10.1007/s10729-021-09554-4

[thumbnail of main_UNBLINDED.pdf]
PDF - Accepted Post-Print Version
Download (919kB) | Preview


Early identification of resource needs is instrumental in promoting efficient hospital resource management. Hospital information systems, and electronic health records (EHR) in particular, collect valuable demographic and clinical patient data from the moment patients are admitted, which can help predict expected resource needs in early stages of patient episodes. To this end, this article proposes a data mining methodology to systematically obtain predictions for relevant managerial variables by leveraging structured EHR data. Specifically, these managerial variables are: i) Diagnosis categories, ii) procedure codes, iii) diagnosis-related groups (DRGs), iv) outlier episodes and v) length of stay (LOS). The proposed methodology approaches the problem in four stages: Feature set construction, feature selection, prediction model development, and model performance evaluation. We tested this approach with an EHR dataset of~5,089 inpatient episodes and compared different classification and regression models (for categorical and continuous variables, respectively), performed temporal analysis of model performance, analyzed the impact of training set homogeneity on performance and assessed the contribution of different EHR data elements for model predictive power. Overall, our results indicate that inpatient EHR data can effectively be leveraged to inform resource management on multiple perspectives. Logistic regression (combined with minimal redundancy maximum relevance feature selection) and bagged decision trees yielded best results for predicting categorical and numerical managerial variables, respectively. Furthermore, our temporal analysis indicated that, while DRG classes are more difficult to predict, several diagnosis categories, procedure codes and LOS amongst shorter-stay patients can be predicted with higher confidence in early stages of patient stay. Lastly, value of information analysis indicated that diagnoses, medication and structured assessment forms were the most valuable EHR data elements in predicting managerial variables of interest through a data mining approach.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Mathematics
Subjects: R Medicine > R Medicine (General)
Publisher: Springer Verlag (Germany)
ISSN: 1386-9620
Date of First Compliant Deposit: 8 February 2021
Date of Acceptance: 2 February 2021
Last Modified: 24 May 2022 01:30

Citation Data

Cited 1 time in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics