BMC Medical Genomics (Oct 2023)

A novel proteomic-based model for predicting colorectal cancer with Schistosoma japonicum co‐infection by integrated bioinformatics analysis and machine learning

  • Shan Li,
  • Xuguang Sun,
  • Ting Li,
  • Yanqing Shi,
  • Binjie Xu,
  • Yuyong Deng,
  • Sifan Wang

DOI
https://doi.org/10.1186/s12920-023-01711-8
Journal volume & issue
Vol. 16, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Schistosoma japonicum infection is an important public health problem and the S. japonicum infection is associated with a variety of diseases, including colorectal cancer. We collected the paraffin samples of CRC patients with or without S. japonicum infection according to standard procedures. Data-Independent Acquisition was used to identify differentially expressed proteins (DEPs), protein–protein interaction (PPI) network construction, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analysis and machine learning algorithms (least absolute shrinkage and selection operator (LASSO) regression) were used to identify candidate genes for diagnosing CRC with S. japonicum infection. To assess the diagnostic value, the nomogram and receiver operating characteristic (ROC) curve were developed. A total of 115 DEPs were screened, the DEPs that were discovered were mostly related with biological process in generation of precursor metabolites and energy,energy derivation by oxidation of organic compounds, carboxylic acid metabolic process, oxoacid metabolic process, cellular respiration aerobic respiration according to the analyses. Enrichment analysis showed that these compounds might regulate oxidoreductase activity, transporter activity, transmembrane transporter activity, ion transmembrane transporter activity and inorganic molecular entity transmembrane transporter activity. Following the development of PPI network and LASSO, 13 genes (hsd17b4, h2ac4, hla-c, pc, epx, rpia, tor1aip1, mindy1, dpysl5, nucks1, cnot2, ndufa13 and dnm3) were filtered, and 3 candidate hub genes were chosen for nomogram building and diagnostic value evaluation after machine learning. The nomogram and all 3 candidate hub genes (hsd17b4, rpia and cnot2) had high diagnostic values (area under the curve is 0.9556). The results of our study indicate that the combination of hsd17b4, rpia, and cnot2 may become a predictive model for the occurrence of CRC in combination with S. japonicum infection. This study also provides new clues for the mechanism research of S. japonicum infection and CRC.

Keywords