Heliyon (Dec 2024)
Identification of biomarkers related to Escherichia coli infection for the diagnosis of gastrointestinal tumors applying machine learning methods
Abstract
Background: Escherichia coli (E. coli) is a part of normal gastrointestinal microbiota but it could also cause human gastrointestinal diseases. Understanding the mechanism of E. coli in the progression of gastrointestinal tumors can provide novel prevention and treatment strategies for gastrointestinal tumors. Methods: The E. coli infection score was calculated by single sample GSEA (ssGSEA). Weighted correlation network analysis (WGCNA) and differentially expressed genes (DEGs) analysis were used to identify genes related to E. coli infection in gastrointestinal tumors. Hub genes were selected by machine learning methods to establish a diagnostic model. The diagnostic performance of the model was evaluated by the area under the receiver operating characteristic (ROC) curve (AUC) and validated in three external datasets. After determining the biomarkers, immune infiltration analysis and GSEA were further performed. The mRNA expressions of the biomarkers in stomach adenocarcinoma (STAD) cells and the invasion and migration of the tumor cells were detected by conducting in vitro experiments. Results: The E. coli infection score was lower in tumor samples than in normal samples. Eight hub genes were selected from a total of 28 genes associated with E. coli-related dysbiosis in gastrointestinal tumors to establish an accurate diagnostic model. The AUC values of PRKCB and IL16 were all greater than 0.7 in three external datasets and the mRNA expression pattern was consistent with TCGA cohort, therefore PRKCB and IL16 were selected as the diagnostic biomarkers. PRKCB and IL16 exhibited significant positive correlations with most immune cells, and inflammation-related pathways were activated in the high expression groups of PRKCB and IL16. Moreover, IL16 was high-expressed but PRKCB was low-expressed in STAD cells, and silencing IL16 suppressed the invasion and migration of STAD cells. Conclusions: Overall, we identified and validated 8 robust genes related to E. coli applying bioinformatics and machine learning algorithms, providing theoretical foundations for the relationship between E. coli-related dysbiosis and gastrointestinal tumors.