EClinicalMedicine (Apr 2023)
Development and validation of sex-specific hip fracture prediction models using electronic health records: a retrospective, population-based cohort studyResearch in context
Abstract
Summary: Background: Hip fracture is associated with immobility, morbidity, mortality, and high medical cost. Due to limited availability of dual-energy X-ray absorptiometry (DXA), hip fracture prediction models without using bone mineral density (BMD) data are essential. We aimed to develop and validate 10-year sex-specific hip fracture prediction models using electronic health records (EHR) without BMD. Methods: In this retrospective, population-based cohort study, anonymized medical records were retrieved from the Clinical Data Analysis and Reporting System for public healthcare service users in Hong Kong aged ≥60 years as of 31 December 2005. A total of 161,051 individuals (91,926 female; 69,125 male) with complete follow-up from 1 January 2006 till the study end date on 31 December 2015 were included in the derivation cohort. The sex-stratified derivation cohort was randomly divided into 80% training and 20% internal testing datasets. An independent validation cohort comprised 3046 community-dwelling participants aged ≥60 years as of 31 December 2005 from the Hong Kong Osteoporosis Study, a prospective cohort which recruited participants between 1995 and 2010. With 395 potential predictors (age, diagnosis, and drug prescription records from EHR), 10-year sex-specific hip fracture prediction models were developed using stepwise selection by logistic regression (LR) and four machine learning (ML) algorithms (gradient boosting machine, random forest, eXtreme gradient boosting, and single-layer neural networks) in the training cohort. Model performance was evaluated in both internal and independent validation cohorts. Findings: In female, the LR model had the highest AUC (0.815; 95% Confidence Interval [CI]: 0.805–0.825) and adequate calibration in internal validation. Reclassification metrics showed the LR model had better discrimination and classification performance than the ML algorithms. Similar performance was attained by the LR model in independent validation, with high AUC (0.841; 95% CI: 0.807–0.87) comparable to other ML algorithms. In internal validation for male, LR model had high AUC (0.818; 95% CI: 0.801–0.834) and it outperformed all ML models as indicated by reclassification metrics, with adequate calibration. In independent validation, the LR model had high AUC (0.898; 95% CI: 0.857–0.939) comparable to ML algorithms. Reclassification metrics demonstrated that LR model had the best discrimination performance. Interpretation: Even without using BMD data, the 10-year hip fracture prediction models developed by conventional LR had better discrimination performance than the models developed by ML algorithms. Upon further validation in independent cohorts, the LR models could be integrated into the routine clinical workflow, aiding the identification of people at high risk for DXA scan. Funding: Health and Medical Research Fund, Health Bureau, Hong Kong SAR Government (reference: 17181381).