Scientific Reports (Mar 2023)
Machine learning classifiers for screening nonalcoholic fatty liver disease in general adults
Abstract
Abstract Nonalcoholic fatty liver disease (NAFLD) is one of major causes of end-stage liver disease in the coming decades, but it shows few symptoms until it develops into cirrhosis. We aim to develop classification models with machine learning to screen NAFLD patients among general adults. This study included 14,439 adults who took health examination. We developed classification models to classify subjects with or without NAFLD using decision tree, random forest (RF), extreme gradient boosting (XGBoost) and support vector machine (SVM). The classifier with SVM was showed the best performance with the highest accuracy (0.801), positive predictive value (PPV) (0.795), F1 score (0.795), Kappa score (0.508) and area under the precision-recall curve (AUPRC) (0.712), and the second top of area under receiver operating characteristic curve (AUROC) (0.850). The second-best classifier was RF model, which was showed the highest AUROC (0.852) and the second top of accuracy (0.789), PPV (0.782), F1 score (0.782), Kappa score (0.478) and AUPRC (0.708). In conclusion, the classifier with SVM is the best one to screen NAFLD in general population based on the results from physical examination and blood testing, followed by the classifier with RF. Those classifiers have a potential to screen NAFLD in general population for physician and primary care doctors, which could benefit to NAFLD patients from early diagnosis.