Identification of kidney cell types in scRNA-seq and snRNA-seq data using machine learning algorithms

Adam Tisch; Siddharth Madapoosi; Stephen Blough; Jan Rosa; Sean Eddy; Laura Mariani; Abhijit Naik; Christine Limonte; Philip McCown; Rajasree Menon; Sylvia E. Rosas; Chirag R. Parikh; Matthias Kretzler; Ahmed Mahfouz; Fadhl Alakwaa

Heliyon (Oct 2024)

Identification of kidney cell types in scRNA-seq and snRNA-seq data using machine learning algorithms

Adam Tisch,
Siddharth Madapoosi,
Stephen Blough,
Jan Rosa,
Sean Eddy,
Laura Mariani,
Abhijit Naik,
Christine Limonte,
Philip McCown,
Rajasree Menon,
Sylvia E. Rosas,
Chirag R. Parikh,
Matthias Kretzler,
Ahmed Mahfouz,
Fadhl Alakwaa

Affiliations

Adam Tisch: Undergraduate Research Opportunity Program, University of Michigan, Ann Arbor, MI, USA
Siddharth Madapoosi: University of Michigan Medical School, Ann Arbor, MI, USA
Stephen Blough: Undergraduate Research Opportunity Program, University of Michigan, Ann Arbor, MI, USA
Jan Rosa: Undergraduate Research Opportunity Program, University of Michigan, Ann Arbor, MI, USA
Sean Eddy: Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
Laura Mariani: Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
Abhijit Naik: Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
Christine Limonte: Division of Nephrology, University of Washington, Seattle, WA, USA
Philip McCown: Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
Rajasree Menon: Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Sylvia E. Rosas: Kidney and Hypertension Unit, Joslin Diabetes Center and Harvard Medical School, Boston, MA, USA
Chirag R. Parikh: Johns Hopkins School of Medicine, Baltimore, MD, USA
Matthias Kretzler: Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Ahmed Mahfouz: Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands & Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands
Fadhl Alakwaa: Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; Corresponding author.

Journal volume & issue: Vol. 10, no. 19
p. e38567

Abstract

Read online

Introduction: Single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) provide valuable insights into the cellular states of kidney cells. However, the annotation of cell types often requires extensive domain expertise and time-consuming manual curation, limiting scalability and generalizability. To facilitate this process, we tested the performance of five supervised classification methods for automatic cell type annotation. Results: We analyzed publicly available sc/snRNA-seq datasets from five expert-annotated studies, comprising 62,120 cells from 79 kidney biopsy samples. Datasets were integrated by harmonizing cell type annotations across studies. Five different supervised machine learning algorithms (support vector machines, random forests, multilayer perceptrons, k-nearest neighbors, and extreme gradient boosting) were applied to automatically annotate cell types using four training datasets and one testing dataset. Performance metrics, including accuracy (F1 score) and rejection rates, were evaluated. All five machine learning algorithms demonstrated high accuracies, with a median F1 score of 0.94 and a median rejection rate of 1.8 %. The algorithms performed equally well across different datasets and successfully rejected cell types that were not present in the training data. However, F1 scores were lower when models trained primarily on scRNA-seq data were tested on snRNA-seq data. Conclusions: Despite limitations including the number of biopsy samples, our findings demonstrate that machine learning algorithms can accurately annotate a wide range of adult kidney cell types in scRNA-seq/snRNA-seq data. This approach has the potential to standardize cell type annotation and facilitate further research on cellular mechanisms underlying kidney disease.

Published in Heliyon

ISSN: 2405-8440 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General); Social Sciences: Social sciences (General)
Website: https://www.cell.com/heliyon/home

About the journal

Abstract

Keywords