Frontiers in Medicine (Nov 2024)

Machine learning based ultrasomics noninvasive predicting EGFR expression status in hepatocellular carcinoma patients

  • Yujing Ma,
  • Shaobo Duan,
  • Shanshan Ren,
  • Didi Bu,
  • Yahong Li,
  • Xiguo Cai,
  • Lianzhong Zhang,
  • Lianzhong Zhang

DOI
https://doi.org/10.3389/fmed.2024.1483291
Journal volume & issue
Vol. 11

Abstract

Read online

ObjectiveTo investigate the ability of ultrasomics to noninvasively predict epidermal growth factor receptor (EGFR) expression status in patients with hepatocellular carcinoma (HCC).Methods198 HCC patients were comprised in the study (n = 138 in the training dataset and n = 60 in the test dataset). EGFR expression was detected by immunohistochemistry. Ultrasomics features from gray-scale ultrasound images were extracted. Intra-class correlation coefficient (ICC) screening, variance filtering, mutual information method, and extreme gradient boosting (XGboost) embedding method were applied for selecting the best features. Random forest (RF), XGBoost, support vector machine (SVM), decision tree (DT), and logistic regression (LR) 5 machine learning algorithms were used to construct clinical models, ultrasomics models, and clinical-ultrasomics combined models, respectively. Area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, decision curve analysis (DCA), and calibration curve were used to assess the predictive performance of the model.ResultsIn 198 patients, high EGFR expression was observed in 100 patients and low EGFR expression was observed in 98 patients. The RF machine learning ultrasomics model was found to perform well, with the AUC of the training and test dataset being 0.929 (95%CI, 0.874–0.966) and 0.807 (95%CI, 0.684–0.897) respectively, the sensitivity being 0.843 and 0.767 respectively, the specificity being 0.857 and 0.800 respectively, and the accuracy being 0.850 and 0.783, respectively. The predictive performance of the combined model established by integrating ultrasomics features and clinical baseline characteristics was improved, with the AUC, sensitivity, specificity, and accuracy of the RF machine learning combined model for the training and test dataset reaching 0.937 (95%CI, 0.884–0.971), 0.822 (95%CI, 0.702–0.909); 0.857, 0.833; 0.857, 0.800; 0.857, 0.817, respectively.ConclusionTo predict the status of EGFR expression in HCC patients, the ultrasomics model and combined model created by five machine learning algorithms can be utilized as efficient and noninvasive techniques, and the ultrasomics model and combined model established by RF classifier have the best predictive performance.

Keywords