Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation

Mikhail Arbatsky; Ekaterina Vasilyeva; Veronika Sysoeva; Ekaterina Semina; Ekaterina Semina; Valeri Saveliev; Kseniya Rubina

doi:10.3389/fbinf.2025.1519468

Frontiers in Bioinformatics (Feb 2025)

Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation

Mikhail Arbatsky,
Ekaterina Vasilyeva,
Veronika Sysoeva,
Ekaterina Semina,
Ekaterina Semina,
Valeri Saveliev,
Kseniya Rubina

Affiliations

Mikhail Arbatsky: Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
Ekaterina Vasilyeva: Institute of Higher Technologies, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
Veronika Sysoeva: Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
Ekaterina Semina: Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia
Ekaterina Semina: Institute of Medicine and Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
Valeri Saveliev: Institute of Higher Technologies, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
Kseniya Rubina: Faculty of Medicine, Lomonosov Moscow State University, Moscow, Russia

DOI: https://doi.org/10.3389/fbinf.2025.1519468
Journal volume & issue: Vol. 5

Abstract

Read online

Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.

Published in Frontiers in Bioinformatics

ISSN: 2673-7647 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.frontiersin.org/journals/bioinformatics

About the journal

Abstract

Keywords