Main Article Content
Hospital administrative data is a valuable source to measure myocardial infarction (MI) rates. However, admission counts are susceptible to over-inflation if the patient is transferred multiple times during a single episode of care, and variables denoting transfers may not be reliable. To obtain an accurate number of events, hospital transfers need to be correctly identified.
Objectives and Approach
We assessed multivariable logistic regression and various machine-learning models to predict transfers in hospital administrative data. Using Western Australian linked hospital data, we identified records from 2000-2016 with a principal discharge diagnosis of MI. Our standard method to compare against was a 24-hour look-back to identify a transfer using just admission and separation dates from the current and previous records for the same patient. Multivariable logistic regression and decision trees with various boosting algorithms were used to predict if a single record was a transfer, using variables recorded in the admission (e.g. age, sex, type of hospital, admitted from, emergency/elective admission). The performance of each model was calculated using metrics including area under the curve (AUC).
Records in the training, validation and testing samples had similar characteristics: mean age=68.9 years, 66% were male and 58% admitted to tertiary hospitals. Gradient Boosting Decision Tree (AUC=0.887, 95%CI: 0.886-0.887) outperformed multivariable logistic regression (AUC=0.875; 95% CI: 0.869-0.881) and random forest models (AUC=0.859; 95% CI: 0.853-0.865).
Conclusion / Implications
Multivariable logistic regression and machine-learning models are able to identify transfers in a single record from existing variables. They can be used in unlinked hospital administrative data where records belonging to the same patient cannot be identified.
This work is licensed under a Creative Commons Attribution 4.0 International License.