Scientific Reports (Jul 2022)
Application of machine learning methods for the prediction of true fasting status in patients performing blood tests
Abstract
Abstract The fasting blood glucose (FBG) values extracted from electronic medical records (EMR) are assumed valid in existing research, which may cause diagnostic bias due to misclassification of fasting status. We proposed a machine learning (ML) algorithm to predict the fasting status of blood samples. This cross-sectional study was conducted using the EMR of a medical center from 2003 to 2018 and a total of 2,196,833 ontological FBGs from the outpatient service were enrolled. The theoretical true fasting status are identified by comparing the values of ontological FBG with average glucose levels derived from concomitant tested HbA1c based on multi-criteria. In addition to multiple logistic regression, we extracted 67 features to predict the fasting status by eXtreme Gradient Boosting (XGBoost). The discrimination and calibration of the prediction models were also assessed. Real-world performance was gauged by the prevalence of ineffective glucose measurement (IGM). Of the 784,340 ontologically labeled fasting samples, 77.1% were considered theoretical FBGs. The median (IQR) glucose and HbA1c level of ontological and theoretical fasting samples in patients without diabetes mellitus (DM) were 94.0 (87.0, 102.0) mg/dL and 5.6 (5.4, 5.9)%, and 92.0 (86.0, 99.0) mg/dL and 5.6 (5.4, 5.9)%, respectively. The XGBoost showed comparable calibration and AUROC of 0.887 than that of 0.868 in multiple logistic regression in the parsimonious approach and identified important predictors of glucose level, home-to-hospital distance, age, and concomitantly serum creatinine and lipid testing. The prevalence of IGM dropped from 27.8% based on ontological FBGs to 0.48% by using algorithm-verified FBGs. The proposed ML algorithm or multiple logistic regression model aids in verification of the fasting status.