IEEE Access (Jan 2024)
Enhancing Lung Cancer Classification and Prediction With Deep Learning and Multi-Omics Data
Abstract
Lung adenocarcinoma (LUAD), a prevalent histological type of lung cancer and a subtype of non-small cell lung cancer (NSCLC) accounts for 45–55% of all lung cancer cases. Various factors, including environmental influences and genetics, have been identified as contributors to the initiation and progression of LUAD. Recent large-scale analyses have probed into RNASeq, miRNA, and DNA methylation alterations in LUAD. In this study, we devised an innovative deep-learning model for lung cancer detection by integrating markers from mRNA, miRNA, and DNA methylation. The initial phase involved meticulous data preparation, encompassing multiple steps, followed by a differential analysis aimed at identifying genes exhibiting differential expression across different lung cancer stages (Stages I, II, III, and IV). The DESeq2 technique was employed for RNASeq data, while the LIMMA package was utilized for miRNA and DNA methylation datasets during the differential analysis. Subsequently, integration of all prepared omics data types was achieved by selecting common samples, resulting in a consolidated dataset comprising 448 samples and 8228 features (genes). To streamline features, principal components analysis (PCA) was implemented, and the synthetic minority over-sampling technique (SMOTE) algorithm was applied to ensure class balance. The integrated and processed data were then input into the PCA-SMOTE-CNN model for the classification process. The deep learning model, specifically designed for classifying and predicting lung cancer using an integrated omics dataset, was evaluated using various metrics, including precision, recall, F1-score, and accuracy. Experimental results emphasized the superior predictive performance of the proposed model, attaining an accuracy, precision, recall, and F1-score of 0.97 each, surpassing recent competitive methods.
Keywords