Cancer Control (Apr 2022)

Development and Validation of Machine Learning Models to Predict Epidermal Growth Factor Receptor Mutation in Non-Small Cell Lung Cancer: A Multi-Center Retrospective Radiomics Study

  • Yafeng Liu MD,
  • Jiawei Zhou MD,
  • Jing Wu PhD,
  • Wenyang Wang PhD,
  • Xueqin Wang MD,
  • Jianqiang Guo MD,
  • Qingsen Wang MD,
  • Xin Zhang MD,
  • Danting Li MD,
  • Jun Xie MD,
  • Xuansheng Ding PhD,
  • Yingru Xing PhD,
  • Dong Hu PhD

DOI
https://doi.org/10.1177/10732748221092926
Journal volume & issue
Vol. 29

Abstract

Read online

Objective To develop and validate a generalized prediction model that can classify epidermal growth factor receptor (EGFR) mutation status in non–small cell lung cancer patients. Methods A total of 346 patients (296 in the training cohort and 50 in the validation cohort) from four centers were included in this retrospective study. First, 1085 features were extracted using IBEX from the computed tomography images. The features were screened using the intraclass correlation coefficient, hypothesis tests and least absolute shrinkage and selection operator. Logistic regression (LR), decision tree (DT), random forest (RF), and support vector machine (SVM) were used to build a radiomics model for classification. The models were evaluated using the following metrics: area under the curve (AUC), calibration curve (CAL), decision curve analysis (DCA), concordance index (C-index), and Brier score. Results Sixteen features were selected, and models were built using LR, DT, RF, and SVM. In the training cohort, the AUCs was .723, .842, .995, and .883; In the validation cohort, the AUCs were .658, 0567, .88, and .765. RF model with the best AUC, its CAL, C-index (training cohort=.998; validation cohort=.883), and Brier score (training cohort=.007; validation cohort=0.137) showed a satisfactory predictive accuracy; DCA indicated that the RF model has better clinical application value. Conclusion Machine learning models based on computed tomography images can be used to evaluate EGFR status in patients with non–small cell lung cancer, and the RF model outperformed LR, DT, and SVM.