Hepatocellular Carcinoma Risk Prediction in the NIH-AARP Diet and Health Study Cohort: A Machine Learning Approach

Thomas J; Liao LM; Sinha R; Patel T; Antwi SO

Journal of Hepatocellular Carcinoma (Feb 2022)

Hepatocellular Carcinoma Risk Prediction in the NIH-AARP Diet and Health Study Cohort: A Machine Learning Approach

Thomas J,
Liao LM,
Sinha R,
Patel T,
Antwi SO

Affiliations

Thomas J
Liao LM
Sinha R
Patel T
Antwi SO

Journal volume & issue: Vol. Volume 9
pp. 69 – 81

Abstract

Read online

Jonathan Thomas,1 Linda M Liao,2 Rashmi Sinha,2 Tushar Patel,1,* Samuel O Antwi3,* 1Department of Transplantation, Mayo Clinic, Jacksonville, FL, USA; 2Division of Cancer Epidemiology and Genetics, The National Cancer Institute, Bethesda, MD, USA; 3Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL, USA*These authors contributed equally to this workCorrespondence: Samuel O Antwi, Department of Quantitative Health Sciences, Mayo Clinic, 4500 San Pablo Road South, Vincent Stabile Building 756N, Jacksonville, FL, 32224, USA, Tel +1-904-953-0310, Fax +1-904-953-1447, Email [email protected]: Prediction of hepatocellular carcinoma (HCC) development in persons with known risk factors remain a challenge and is an urgent unmet need, considering projected increases in HCC incidence and mortality in the US. We aimed to use machine learning techniques to identify a set of demographic, lifestyle, and health history information that can be used simultaneously for population-level HCC risk prediction.Methods: Data from 377,065 participants of the NIH-AARP Diet and Health Study, among whom 647 developed HCC over 16 years of follow-up, were analyzed. The sample was randomly divided into independent training (60%) and validation (40%) sets. We evaluated 123 participant characteristics and tested 15 different machine learning algorithms for robustness in predicting HCC risk. Separately, we evaluated variables selected from multivariable logistic regression for risk prediction.Results: The random under-sampling boosting (RUSBoost) algorithm performed best during model testing. Fourteen participant characteristics were selected for risk prediction based on differences between cases and controls (Bonferroni-corrected p-values < 0.0004) and from the most frequently used variables in the initial two decision trees of the RUSBoost learner trees. A predictive model based on the 14 variables had an AUC of 0.72 (sensitivity=0.68, specificity=0.63) and independent validation AUC of 0.65 (sensitivity=0.68, specificity=0.63). A subset of 9 variables identified through logistic regression also had an AUC of 0.72 (sensitivity=0.67, specificity=0.63) and independent validation AUC of 0.65 (sensitivity=0.70, specificity=0.61).Conclusion: Population-level HCC risk prediction can be performed with a machine learning-based algorithm and could inform strategies for improving HCC risk reduction in at-risk groups.Keywords: HCC, hepatocellular carcinoma, liver cancer, machine learning, risk prediction

Published in Journal of Hepatocellular Carcinoma

ISSN: 2253-5969 (Online)
Publisher: Dove Medical Press
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://www.dovepress.com/journal-of-hepatocellular-carcinoma-journal

About the journal