JMIR Medical Informatics (Jun 2022)

Noninvasive Diagnosis of Nonalcoholic Steatohepatitis and Advanced Liver Fibrosis Using Machine Learning Methods: Comparative Study With Existing Quantitative Risk Scores

  • Yonghui Wu,
  • Xi Yang,
  • Heather L Morris,
  • Matthew J Gurka,
  • Elizabeth A Shenkman,
  • Kenneth Cusi,
  • Fernando Bril,
  • William T Donahoo

DOI
https://doi.org/10.2196/36997
Journal volume & issue
Vol. 10, no. 6
p. e36997

Abstract

Read online

BackgroundNonalcoholic steatohepatitis (NASH), advanced fibrosis, and subsequent cirrhosis and hepatocellular carcinoma are becoming the most common etiology for liver failure and liver transplantation; however, they can only be diagnosed at these potentially reversible stages with a liver biopsy, which is associated with various complications and high expenses. Knowing the difference between the more benign isolated steatosis and the more severe NASH and cirrhosis informs the physician regarding the need for more aggressive management. ObjectiveWe intend to explore the feasibility of using machine learning methods for noninvasive diagnosis of NASH and advanced liver fibrosis and compare machine learning methods with existing quantitative risk scores. MethodsWe conducted a retrospective analysis of clinical data from a cohort of 492 patients with biopsy-proven nonalcoholic fatty liver disease (NAFLD), NASH, or advanced fibrosis. We systematically compared 5 widely used machine learning algorithms for the prediction of NAFLD, NASH, and fibrosis using 2 variable encoding strategies. Then, we compared the machine learning methods with 3 existing quantitative scores and identified the important features for prediction using the SHapley Additive exPlanations method. ResultsThe best machine learning method, gradient boosting (GB), achieved the best area under the curve scores of 0.9043, 0.8166, and 0.8360 for NAFLD, NASH, and advanced fibrosis, respectively. GB also outperformed 3 existing risk scores for fibrosis. Among the variables, alanine aminotransferase (ALT), triglyceride (TG), and BMI were the important risk factors for the prediction of NAFLD, whereas aspartate transaminase (AST), ALT, and TG were the important variables for the prediction of NASH, and AST, hyperglycemia (A1c), and high-density lipoprotein were the important variables for predicting advanced fibrosis. ConclusionsIt is feasible to use machine learning methods for predicting NAFLD, NASH, and advanced fibrosis using routine clinical data, which potentially can be used to better identify patients who still need liver biopsy. Additionally, understanding the relative importance and differences in predictors could lead to improved understanding of the disease process as well as support for identifying novel treatment options.