Genome Biology (Aug 2021)

Sfaira accelerates data and model reuse in single cell genomics

  • David S. Fischer,
  • Leander Dony,
  • Martin König,
  • Abdul Moeed,
  • Luke Zappia,
  • Lukas Heumos,
  • Sophie Tritschler,
  • Olle Holmberg,
  • Hananeh Aliee,
  • Fabian J. Theis

DOI
https://doi.org/10.1186/s13059-021-02452-6
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Single-cell RNA-seq datasets are often first analyzed independently without harnessing model fits from previous studies, and are then contextualized with public data sets, requiring time-consuming data wrangling. We address these issues with sfaira, a single-cell data zoo for public data sets paired with a model zoo for executable pre-trained models. The data zoo is designed to facilitate contribution of data sets using ontologies for metadata. We propose an adaption of cross-entropy loss for cell type classification tailored to datasets annotated at different levels of coarseness. We demonstrate the utility of sfaira by training models across anatomic data partitions on 8 million cells.

Keywords