Development and implementation of patient-level prediction models of end-stage renal disease for type 2 diabetes patients using fast healthcare interoperability resources

San Wang; Jieun Han; Se Young Jung; Tae Jung Oh; Sen Yao; Sanghee Lim; Hee Hwang; Ho-Young Lee; Haeun Lee

doi:10.1038/s41598-022-15036-6

Scientific Reports (Jul 2022)

Development and implementation of patient-level prediction models of end-stage renal disease for type 2 diabetes patients using fast healthcare interoperability resources

San Wang,
Jieun Han,
Se Young Jung,
Tae Jung Oh,
Sen Yao,
Sanghee Lim,
Hee Hwang,
Ho-Young Lee,
Haeun Lee

Affiliations

San Wang: Enolink
Jieun Han: Department of Family Medicine, Seoul National University Bundang Hospital
Se Young Jung: Department of Family Medicine, Seoul National University Bundang Hospital
Tae Jung Oh: Department of Internal Medicine, Seoul National University Bundang Hospital
Sen Yao: Enolink
Sanghee Lim: Enolink
Hee Hwang: Department of Digital Healthcare, Seoul National University Bundang Hospital
Ho-Young Lee: Department of Digital Healthcare, Seoul National University Bundang Hospital
Haeun Lee: Department of Digital Healthcare, Seoul National University Bundang Hospital

DOI: https://doi.org/10.1038/s41598-022-15036-6
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 9

Abstract

Read online

Abstract This study aimed to develop a model to predict the 5-year risk of developing end-stage renal disease (ESRD) in patients with type 2 diabetes mellitus (T2DM) using machine learning (ML). It also aimed to implement the developed algorithms into electronic medical records (EMR) system using Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR). The final dataset used for modeling included 19,159 patients. The medical data were engineered to generate various types of features that were input into the various ML classifiers. The classifier with the best performance was XGBoost, with an area under the receiver operator characteristics curve (AUROC) of 0.95 and area under the precision recall curve (AUPRC) of 0.79 using three-fold cross-validation, compared to other models such as logistic regression, random forest, and support vector machine (AUROC range, 0.929–0.943; AUPRC 0.765–0.792). Serum creatinine, serum albumin, the urine albumin-to-creatinine ratio, Charlson comorbidity index, estimated GFR, and medication days of insulin were features that were ranked high for the ESRD risk prediction. The algorithm was implemented in the EMR system using HL7 FHIR through an ML-dedicated server that preprocessed unstructured data and trained updated data.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal