Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 2; peer review: 2 approved]
Maxime Garcia,
Szilveszter Juhos,
Malin Larsson,
Pall I. Olason,
Marcel Martin,
Jesper Eisfeldt,
Sebastian DiLorenzo,
Johanna Sandgren,
Teresita Díaz De Ståhl,
Philip Ewels,
Valtteri Wirta,
Monica Nistér,
Max Käller,
Björn Nystedt
Affiliations
Maxime Garcia
Department of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, Sweden
Szilveszter Juhos
Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, 752 37, Sweden
Malin Larsson
Department of Physics, Chemistry and Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Linköping University, Linköping, 58183, Sweden
Pall I. Olason
Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, 752 37, Sweden
Marcel Martin
Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Box 1031, Solna, 17121, Sweden
Jesper Eisfeldt
Clinical Genetics, Department of Molecular Medicine and Surgery, Karolinska Institutet, MMK L1:00, Karolinska University Hospital at Solna, Stockholm, 171 76, Sweden
Sebastian DiLorenzo
Department of Medical Sciences, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, 752 37, Sweden
Johanna Sandgren
Department of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, Sweden
Teresita Díaz De Ståhl
Department of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, Sweden
Philip Ewels
Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, Solna, 17121, Sweden
Valtteri Wirta
Department of Microbiology, Tumor and Cell Biology, Clinical Genomics Facility, Science for Life Laboratory, Karolinska Institutet, Box 1031, Solna, 171 21, Sweden
Monica Nistér
Department of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, Sweden
Max Käller
School of Engineering Sciences in Chemistry, Biotechnology and Health, Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, Solna, 17121, Sweden
Björn Nystedt
Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, 752 37, Sweden
Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.