Life (Apr 2024)

Machine Learning Reveals Impacts of Smoking on Gene Profiles of Different Cell Types in Lung

  • Qinglan Ma,
  • Yulong Shen,
  • Wei Guo,
  • Kaiyan Feng,
  • Tao Huang,
  • Yudong Cai

DOI
https://doi.org/10.3390/life14040502
Journal volume & issue
Vol. 14, no. 4
p. 502

Abstract

Read online

Smoking significantly elevates the risk of lung diseases such as chronic obstructive pulmonary disease (COPD) and lung cancer. This risk is attributed to the harmful chemicals in tobacco smoke that damage lung tissue and impair lung function. Current research on the impact of smoking on gene expression in specific lung cells is limited. This study addresses this gap by analyzing gene expression profiles at the single-cell level from 43,539 lung endothelial cells, 234,349 lung epithelial cells, 189,843 lung immune cells, and 16,031 lung stromal cells using advanced machine learning techniques. The data, categorized by different lung cell types, were classified into three smoking states: active smoker, former smoker, and never smoker. Each cell sample encompassed 28,024 feature genes. Employing an incremental feature selection method within a computational framework, several specific genes have been identified as potential markers of smoking status in different lung cell types. These include B2M, EEF1A1, and TPT1 in lung endothelial cells; FTL and MT-ATP8 in lung epithelial cells; HLA-B and HLA-C in lung immune cells; and HSP90B1 and LCN2 in lung stroma cells. Additionally, this study developed quantitative rules for representing the gene expression patterns related to smoking. This research highlights the potential of machine learning in oncology, enhancing our molecular understanding of smoking’s harm and laying the groundwork for future mechanism-based studies.

Keywords