Big Data Mining and Analytics (Sep 2024)
G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data
Abstract
Single-cell RNA sequencing (scRNA-seq) technology measures the expression of thousands of genes at the cellular level. Analyzing single-cell transcriptome allows the identification of heterogeneous cell groups, cellular-level regulations, and the trajectory of cell development. An important aspect in the analyses of scRNA-seq data is the clustering of cells, which is hampered by issues, such as high dimensionality, cell type imbalance, redundancy, and dropout. Given cells of each type are functionally consistent, incorporating biological relations among genes may improve the clustering results. In light of this, we have developed a deep-embedded clustering method, G3DC. This method combines a graph regularization based on the pre-existing gene network and a feature selector based on the ℓ2,1-norm regularization, along with a reconstruction loss, to generate a discriminatory and informative embedding. Utilizing the gene interaction network bolsters the clustering performance and aids in selecting functionally coherent genes, consequently enriching the clustering results. Extensive experiments have shown that G3DC offers high clustering accuracy with regard to agreement with true cell types, outperforming other leading single-cell clustering methods. In addition, G3DC selects biologically relevant genes that contribute to the clustering, providing insight into biological functionality that differentiates cell groups.
Keywords