BMC Gastroenterology (Jul 2025)
Explainable machine learning model for predicting the transarterial chemoembolization response and subtypes of hepatocellular carcinoma patients
Abstract
Abstract Background Hepatocellular carcinoma (HCC), the third leading cause of cancer-related deaths globally, faces heterogeneous responses to transarterial chemoembolization (TACE) in intermediate-stage disease. We developed a Machine Learning (ML)-based model integrating routine clinical variables to preoperatively predict TACE efficacy, enabling tailored TACE candidate selection and optimized therapeutic decision-making. Methods This retrospective multicentre study enrolled treatment-naive HCC patients undergoing initial TACE from two independent cohorts: the First Affiliated Hospital of Wenzhou Medical University (training cohort) and Wenzhou Central Hospital (external validation cohort). Through recursive feature elimination (RFE), we systematically developed prediction models employing ten distinct ML algorithms. The SHAP algorithm was implemented to enhance model interpretability, while patient stratification was subsequently performed using PCA and K-means clustering to facilitate comprehensive prognostic analysis. Results We retrospectively collected 382 unresectable HCC patients from the First Affiliated Hospital of Wenzhou Medical University and 52 from Wenzhou Central Hospital. RFE method identified 10 predictors for constructing ML models. XGBoost and CatBoost outperformed other algorithms, achieving AUCs of 0.796–0.799 (internal test) and 0.785–0.791 (external validation) with balanced accuracy (76-76.8%). SHAP interpretability revealed tumor burden and hepatic function markers as key determinants of TACE resistance. K-means clustering stratified patients into two prognostically distinct subgroups: Cluster B showed significantly longer survival than Cluster A (HR = 0.36, 95%CI:0.26–0.49, P < 0.001), confirming the clinical relevance of ML-selected features. Conclusion We developed and validated an interpretable ML-based system integrating predictive modelling and patient clustering to individualize TACE efficacy prediction and clinical risk stratification for HCC patients.
Keywords