Emerging Science Journal (Apr 2025)

Genetic Links Between Common Lung Diseases and Lung Cancer Progression: Bioinformatics and Machine Learning Insights

  • Md Ali Hossain,
  • Tania Akter Asa,
  • Md. Zulfiker Mahmud,
  • AKM Azad,
  • Mohammad Zahidur Rahman,
  • Mohammad Ali Moni,
  • Ahmed Moustafa

DOI
https://doi.org/10.28991/esj-2025-09-02-021
Journal volume & issue
Vol. 9, no. 2
pp. 916 – 937

Abstract

Read online

Lung cancer (LC) is one of the most frequently diagnosed cancers and remains the leading cause of cancer-related mortality worldwide, representing a significant global health challenge. While numerous common lung diseases (CLDs) are implicated in LC development, the underlying causes of LC originating from CLDs remain inadequately elucidated. A thorough exploration of LC’s progression from CLDs is essential; our approach integrated bioinformatics and machine learning, utilizing data from GEO and TCGA databases. We began by identifying differentially expressed genes (DEGs) in LC and CLDs, and our gene-disease network revealed for the first time shared DEGs (LC shares significant genes with TB (36), asthma (10), pneumonia (17), COPD (18), and Idiopathic Pulmonary Fibrosis (IPF) (78)), providing insights into potential connections of LC with CLDs. This analysis not only broadened our understanding of their associations but also identified significant pathways and hub proteins (SPTBN1, KCNA4, SCN7A, KCNQ3, GRIA1, and SDC1) through a protein-protein interaction network (PPI). Furthermore, RNA-seq and clinical data were obtained from the cBioPortal portal for shared DEGs of LC and CLDs, assessing their impact on LC patient survival. Integrated mRNA-Seq and clinical data were analyzed via univariate and multivariate Cox Proportional Hazard models to elucidate the influence of significant genes on survival. Furthermore, we developed and deployed a predictive model leveraging the identified hub genes, which demonstrated high accuracy in predicting LC progression. The identified biomarkers and pathways hold promise for further translational research and potential therapeutic targets, advancing understanding of LC development from CLDs. Additionally, co-expression networks among common genes were explored using the Weighted Gene Co-expression Network Analysis (WGCNA). Finally, the hub genes were validated using the Human Protein Atlas (HPA) database and evaluated through various classification algorithms to ascertain their predictive power and diagnostic potential. Doi: 10.28991/ESJ-2025-09-02-021 Full Text: PDF

Keywords