Prediction of Driver Gene Matching in Lung Cancer NOG/PDX Models Based on Artificial Intelligence

Yayi He; Haoyue Guo; Li Diao; Yu Chen; Junjie Zhu; Hiran C. Fernando; Diego Gonzalez Rivas; Hui Qi; Chunlei Dai; Xuzhen Tang; Jun Zhu; Jiawei Dai; Kan He; Dan Chan; Yang Yang

Engineering (Aug 2022)

Prediction of Driver Gene Matching in Lung Cancer NOG/PDX Models Based on Artificial Intelligence

Yayi He,
Haoyue Guo,
Li Diao,
Yu Chen,
Junjie Zhu,
Hiran C. Fernando,
Diego Gonzalez Rivas,
Hui Qi,
Chunlei Dai,
Xuzhen Tang,
Jun Zhu,
Jiawei Dai,
Kan He,
Dan Chan,
Yang Yang

Affiliations

Yayi He: School of Medicine, Tongji University, Shanghai 200092, China; Department of Medical Oncology, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
Haoyue Guo: School of Medicine, Tongji University, Shanghai 200092, China; Department of Medical Oncology, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
Li Diao: Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Yu Chen: Spine Center, Orthopedic Department, Shanghai Changzheng Hospital, Shanghai 200003, China
Junjie Zhu: Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
Hiran C. Fernando: Department of Thoracic Surgery, Allegheny General Hospital, Pittsburgh, PA 15212, USA
Diego Gonzalez Rivas: Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China; Department of Thoracic Surgery and Minimally Invasive Thoracic Surgery Unit (UCTMI), Coruña University Hospital, Coruña 15006, Spain
Hui Qi: Oncology and Immunology BU, Research Service Division, WuXi Apptec, Shanghai 200131, China
Chunlei Dai: Oncology and Immunology BU, Research Service Division, WuXi Apptec, Shanghai 200131, China
Xuzhen Tang: Oncology and Immunology BU, Research Service Division, WuXi Apptec, Shanghai 200131, China
Jun Zhu: School of Medicine, Tongji University, Shanghai 200092, China; Department of Medical Oncology, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China
Jiawei Dai: SJTU–Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
Kan He: SJTU–Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
Dan Chan: Division of Medical Oncology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
Yang Yang: Department of Thoracic Surgery, Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai 200433, China; School of Materials Science and Engineering, Tongji University, Shanghai 201804, China; Corresponding author.

Journal volume & issue: Vol. 15
pp. 102 – 114

Abstract

Read online

Patient-derived tumor xenografts (PDXs) are a powerful tool for drug discovery and screening in cancer. However, current studies have led to little understanding of genotype mismatches in PDXs, leading to massive economic losses. Here, we established PDX models from 53 lung cancer patients with a genotype matching rate of 79.2% (42/53). Furthermore, 17 clinicopathological features were examined and input in stepwise logistic regression (LR) models based on the lowest Akaike information criterion (AIC), least absolute shrinkage and selection operator (LASSO)-LR, support vector machine (SVM) recursive feature elimination (SVM-RFE), extreme gradient boosting (XGBoost), gradient boosting and categorical features (CatBoost), and the synthetic minority oversampling technique (SMOTE). Finally, the performance of all models was evaluated by the accuracy, area under the receiver operating characteristic curve (AUC), and F1 score in 100 testing groups. Two multivariable LR models revealed that age, number of driver gene mutations, epidermal growth factor receptor (EGFR) gene mutations, type of prior chemotherapy, prior tyrosine kinase inhibitor (TKI) therapy, and the source of the sample were powerful predictors. Moreover, CatBoost (mean accuracy = 0.960; mean AUC = 0.939; mean F1 score = 0.908) and the eight-feature SVM-RFE (mean accuracy = 0.950; mean AUC = 0.934; mean F1 score = 0.903) showed the best performance among the algorithms. Meanwhile, application of the SMOTE improved the predictive capability of most models, except CatBoost. Based on the SMOTE, the ensemble classifier of single models achieved the highest accuracy (mean = 0.975), AUC (mean = 0.949), and F1 score (mean = 0.938). In conclusion, we established an optimal predictive model to screen lung cancer patients for non-obese diabetic (NOD)/Shi-scid, interleukin-2 receptor (IL-2R) γnull (NOG)/PDX models and offer a general approach for building predictive models.

Published in Engineering

ISSN: 2095-8099 (Print); 2096-0026 (Online)
Publisher: Elsevier
Country of publisher: China
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.journals.elsevier.com/engineering

About the journal

Abstract

Keywords