Prediction of Driver Gene Matching in Lung Cancer NOG/PDX Models Based on Artificial Intelligence
Yayi He,
Haoyue Guo,
Li Diao,
Yu Chen,
Junjie Zhu,
Hiran C. Fernando,
Diego Gonzalez Rivas,
Hui Qi,
Chunlei Dai,
Xuzhen Tang,
Jun Zhu,
Jiawei Dai,
Kan He,
Dan Chan,
Yang Yang
Affiliations
Yayi He
School of Medicine, Tongji University, Shanghai 200092, China; Department of Medical Oncology, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
Haoyue Guo
School of Medicine, Tongji University, Shanghai 200092, China; Department of Medical Oncology, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
Li Diao
Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Yu Chen
Spine Center, Orthopedic Department, Shanghai Changzheng Hospital, Shanghai 200003, China
Junjie Zhu
Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
Hiran C. Fernando
Department of Thoracic Surgery, Allegheny General Hospital, Pittsburgh, PA 15212, USA
Diego Gonzalez Rivas
Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China; Department of Thoracic Surgery and Minimally Invasive Thoracic Surgery Unit (UCTMI), Coruña University Hospital, Coruña 15006, Spain
Hui Qi
Oncology and Immunology BU, Research Service Division, WuXi Apptec, Shanghai 200131, China
Chunlei Dai
Oncology and Immunology BU, Research Service Division, WuXi Apptec, Shanghai 200131, China
Xuzhen Tang
Oncology and Immunology BU, Research Service Division, WuXi Apptec, Shanghai 200131, China
Jun Zhu
School of Medicine, Tongji University, Shanghai 200092, China; Department of Medical Oncology, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
Jiawei Dai
SJTU–Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
Kan He
SJTU–Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
Dan Chan
Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Yang Yang
Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China; School of Materials Science and Engineering, Tongji University, Shanghai 201804, China; Corresponding author.
Patient-derived tumor xenografts (PDXs) are a powerful tool for drug discovery and screening in cancer. However, current studies have led to little understanding of genotype mismatches in PDXs, leading to massive economic losses. Here, we established PDX models from 53 lung cancer patients with a genotype matching rate of 79.2% (42/53). Furthermore, 17 clinicopathological features were examined and input in stepwise logistic regression (LR) models based on the lowest Akaike information criterion (AIC), least absolute shrinkage and selection operator (LASSO)-LR, support vector machine (SVM) recursive feature elimination (SVM-RFE), extreme gradient boosting (XGBoost), gradient boosting and categorical features (CatBoost), and the synthetic minority oversampling technique (SMOTE). Finally, the performance of all models was evaluated by the accuracy, area under the receiver operating characteristic curve (AUC), and F1 score in 100 testing groups. Two multivariable LR models revealed that age, number of driver gene mutations, epidermal growth factor receptor (EGFR) gene mutations, type of prior chemotherapy, prior tyrosine kinase inhibitor (TKI) therapy, and the source of the sample were powerful predictors. Moreover, CatBoost (mean accuracy = 0.960; mean AUC = 0.939; mean F1 score = 0.908) and the eight-feature SVM-RFE (mean accuracy = 0.950; mean AUC = 0.934; mean F1 score = 0.903) showed the best performance among the algorithms. Meanwhile, application of the SMOTE improved the predictive capability of most models, except CatBoost. Based on the SMOTE, the ensemble classifier of single models achieved the highest accuracy (mean = 0.975), AUC (mean = 0.949), and F1 score (mean = 0.938). In conclusion, we established an optimal predictive model to screen lung cancer patients for non-obese diabetic (NOD)/Shi-scid, interleukin-2 receptor (IL-2R) γnull (NOG)/PDX models and offer a general approach for building predictive models.