Scientific Reports (Dec 2023)
Machine learning for predicting hepatitis B or C virus infection in diabetic patients
Abstract
Abstract Highly prevalent hepatitis B and hepatitis C virus (HBV and HCV) infections have been reported among individuals with diabetes. Given the frequently asymptomatic nature of hepatitis and the challenges associated with screening in some vulnerable populations such as diabetes patients, we conducted an investigation into the performance of various machine learning models for the identification of hepatitis in diabetic patients while also evaluating the significance of features. Analyzing NHANES data from 2013 to 2018, machine learning models were evaluated; random forest (RF), support vector machine (SVM), eXtreme Gradient Boosting (XGBoost), and least absolute shrinkage and selection operator (LASSO) along with stacked ensemble model. We performed hyperparameter tuning to improve the performance of the model, and selected important predictors using the best performance model. LASSO showed the highest predictive performance (AUC-ROC = 0.810) rather than other models. Illicit drug use, poverty, and race were highly ranked as predictive factors for developing hepatitis in diabetes patients. Our study demonstrated that a machine-learning-based model performed optimally in the detection of hepatitis among diabetes patients, achieving high performance. Furthermore, models and predictors evaluated from the current study, we expect, could be supportive information for developing screening or treatment methods for hepatitis care in diabetes patients.