Journal of Big Data (Oct 2024)
HepScope: CNN-based single-cell discrimination of malignant hepatocytes
Abstract
Abstract Hepatocellular carcinoma (HCC) presents a major health issue worldwide. This study introduces the HepScope gene set, developed through a hybrid approach that utilizes single-cell RNA sequencing (scRNA-seq), spatial transcriptomics (stRNA-seq), bulk RNA-seq, and proteomics data, along with the adaptation of Convolutional Neural Network (CNN) techniques. Comprising 113 genes significantly upregulated in malignant hepatocytes, HepScope gene set includes 77 genes unique to this set compared to other HCC-related gene sets. Unlike existing solutions, HepScope gene set demonstrated superior discriminatory power in distinguishing malignant from non-malignant hepatocytes, as validated by Seurat's module score, outperforming five other gene sets across multiple datasets. A 1D-CNN model, specifically adapted for the HepScope gene set, achieved superior accuracy (0.71), AUROC (0.82), and F1 score (0.85) compared to models trained on other gene sets, underscoring its enhanced predictive precision and robustness. Rigorous cross-validation and benchmarking against alternative models confirmed HepScope’s consistent performance. Additionally, we evaluated the prognostic capability of the HepScope gene set across two independent cohorts, demonstrating that the HepScope-based risk score is a strong independent predictor of overall and disease-free survival, making it a valuable tool for patient prognosis. Collectively, our findings position HepScope as a promising gene set for both diagnostic and prognostic applications in HCC, highlighting its potential in precision medicine for HCC and offering insights into personalized therapy development.
Keywords