Computational and Structural Biotechnology Journal (Dec 2024)
scIALM: A method for sparse scRNA-seq expression matrix imputation using the Inexact Augmented Lagrange Multiplier with low error
Abstract
Single-cell RNA sequencing (scRNA-seq) is a high-throughput sequencing technology that quantifies gene expression profiles of specific cell populations at the single-cell level, providing a foundation for studying cellular heterogeneity and patient pathological characteristics. It is effective for developmental, fertility, and disease studies. However, the cell-gene expression matrix of single-cell sequencing data is often sparse and contains numerous zero values. Some of the zero values derive from noise, where dropout noise has a large impact on downstream analysis. In this paper, we propose a method named scIALM for imputation recovery of sparse single-cell RNA data expression matrices, which employs the Inexact Augmented Lagrange Multiplier method to use sparse but clean (accurate) data to recover unknown entries in the matrix. We perform experimental analysis on four datasets, calling the expression matrix after Quality Control (QC) as the original matrix, and comparing the performance of scIALM with six other methods using mean squared error (MSE), mean absolute error (MAE), Pearson correlation coefficient (PCC), and cosine similarity (CS). Our results demonstrate that scIALM accurately recovers the original data of the matrix with an error of 10e-4, and the mean value of the four metrics reaches 4.5072 (MSE), 0.765 (MAE), 0.8701 (PCC), 0.8896 (CS). In addition, at 10%-50% random masking noise, scIALM is the least sensitive to the masking ratio. For downstream analysis, this study uses adjusted rand index (ARI) and normalized mutual information (NMI) to evaluate the clustering effect, and the results are improved on three datasets containing real cluster labels.