Identification of kidney cell types in scRNA-seq and snRNA-seq data using machine learning algorithms
Adam Tisch,
Siddharth Madapoosi,
Stephen Blough,
Jan Rosa,
Sean Eddy,
Laura Mariani,
Abhijit Naik,
Christine Limonte,
Philip McCown,
Rajasree Menon,
Sylvia E. Rosas,
Chirag R. Parikh,
Matthias Kretzler,
Ahmed Mahfouz,
Fadhl Alakwaa
Affiliations
Adam Tisch
Undergraduate Research Opportunity Program, University of Michigan, Ann Arbor, MI, USA
Siddharth Madapoosi
University of Michigan Medical School, Ann Arbor, MI, USA
Stephen Blough
Undergraduate Research Opportunity Program, University of Michigan, Ann Arbor, MI, USA
Jan Rosa
Undergraduate Research Opportunity Program, University of Michigan, Ann Arbor, MI, USA
Sean Eddy
Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
Laura Mariani
Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
Abhijit Naik
Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
Christine Limonte
Division of Nephrology, University of Washington, Seattle, WA, USA
Philip McCown
Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
Rajasree Menon
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Sylvia E. Rosas
Kidney and Hypertension Unit, Joslin Diabetes Center and Harvard Medical School, Boston, MA, USA
Chirag R. Parikh
Johns Hopkins School of Medicine, Baltimore, MD, USA
Matthias Kretzler
Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Ahmed Mahfouz
Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands & Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands
Fadhl Alakwaa
Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; Corresponding author.
Introduction: Single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) provide valuable insights into the cellular states of kidney cells. However, the annotation of cell types often requires extensive domain expertise and time-consuming manual curation, limiting scalability and generalizability. To facilitate this process, we tested the performance of five supervised classification methods for automatic cell type annotation. Results: We analyzed publicly available sc/snRNA-seq datasets from five expert-annotated studies, comprising 62,120 cells from 79 kidney biopsy samples. Datasets were integrated by harmonizing cell type annotations across studies. Five different supervised machine learning algorithms (support vector machines, random forests, multilayer perceptrons, k-nearest neighbors, and extreme gradient boosting) were applied to automatically annotate cell types using four training datasets and one testing dataset. Performance metrics, including accuracy (F1 score) and rejection rates, were evaluated. All five machine learning algorithms demonstrated high accuracies, with a median F1 score of 0.94 and a median rejection rate of 1.8 %. The algorithms performed equally well across different datasets and successfully rejected cell types that were not present in the training data. However, F1 scores were lower when models trained primarily on scRNA-seq data were tested on snRNA-seq data. Conclusions: Despite limitations including the number of biopsy samples, our findings demonstrate that machine learning algorithms can accurately annotate a wide range of adult kidney cell types in scRNA-seq/snRNA-seq data. This approach has the potential to standardize cell type annotation and facilitate further research on cellular mechanisms underlying kidney disease.