Advanced Intelligent Systems (Dec 2023)

Machine Learning and Bioinformatics Analysis for Laboratory Data in Pan‐Cancers Detection

  • Yin Jia,
  • Zixin Liu,
  • Jie Guo,
  • Chengwen He,
  • Xinru Zhou,
  • Miao Xue,
  • Tian Nie,
  • Tingting Sun,
  • Jinsong Kang,
  • Qiong Lu,
  • Lei Jiang,
  • Shanrong Liu

DOI
https://doi.org/10.1002/aisy.202300283
Journal volume & issue
Vol. 5, no. 12
pp. n/a – n/a

Abstract

Read online

Early diagnosis of cancer is crucial to improving the long‐term survival rate of patients. However, commonly used tumor markers lack sensitivity and specificity for screening purposes. Herein, 10 diagnostic models for 10 common types of cancer are developed by extreme gradient boosting, incorporating 66 laboratory parameters. The datasets consist of a retrospective cohort of 737 503 training and 184 012 validation cases, and a prospective cohort of 174 894 cases for model testing. The areas under the curve of the 10 diagnostic models range from 0.763 to 0.993. Notably, the different models have varying numbers of identical parameters among the 66 test features. Additionally, SHapley Additive exPlanation analysis reveals that 54 nontumor markers contributed significantly to the models. Cosine similarity analysis and clustering analysis demonstrate that some of the 10 cancers share common pathophysiological characteristics. Feature‐based inference graph models are thus performed and infer relationships between nontumor index parameters and cancers with strong correlations. In conclusion, a machine learning‐based pan‐cancer early warning system has been established in this study, which can guide doctors in selecting more accurate testing indicators and assessing the risk of 10 types of cancer with greater precision.

Keywords