Benchmarking foundation cell models for post-perturbation RNA-seq prediction

Gerold Csendes; Gema Sanz; Kristóf Z. Szalay; Bence Szalai

doi:10.1186/s12864-025-11600-2

BMC Genomics (Apr 2025)

Benchmarking foundation cell models for post-perturbation RNA-seq prediction

Gerold Csendes,
Gema Sanz,
Kristóf Z. Szalay,
Bence Szalai

Affiliations

Gerold Csendes: Turbine Ltd.
Gema Sanz: Turbine Ltd.
Kristóf Z. Szalay: Turbine Ltd.
Bence Szalai: Turbine Ltd.

DOI: https://doi.org/10.1186/s12864-025-11600-2
Journal volume & issue: Vol. 26, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Accurately predicting cellular responses to perturbations is essential for understanding cell behaviour in both healthy and diseased states. While perturbation data is ideal for building such predictive models, its availability is considerably lower than baseline (non-perturbed) cellular data. To address this limitation, several foundation cell models have been developed using large-scale single-cell gene expression data. These models are fine-tuned after pre-training for specific tasks, such as predicting post-perturbation gene expression profiles, and are considered state-of-the-art for these problems. However, proper benchmarking of these models remains an unsolved challenge. In this study, we benchmarked two recently published foundation models, scGPT and scFoundation, against baseline models. Surprisingly, we found that even the simplest baseline model—taking the mean of training examples—outperformed scGPT and scFoundation. Furthermore, basic machine learning models that incorporate biologically meaningful features outperformed scGPT by a large margin. Additionally, we identified that the current Perturb-Seq benchmark datasets exhibit low perturbation-specific variance, making them suboptimal for evaluating such models. Our results highlight important limitations in current benchmarking approaches and provide insights into more effectively evaluating post-perturbation gene expression prediction models.

Published in BMC Genomics

ISSN: 1471-2164 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Chemical technology: Biotechnology; Science: Biology (General): Genetics
Website: http://bmcgenomics.biomedcentral.com

About the journal

Abstract

Keywords