Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

Johannes Smolander; Sofia Khan; Kalaimathy Singaravelu; Leni Kauko; Riikka J. Lund; Asta Laiho; Laura L. Elo

doi:10.1186/s12864-021-07686-z

BMC Genomics (May 2021)

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

Johannes Smolander,
Sofia Khan,
Kalaimathy Singaravelu,
Leni Kauko,
Riikka J. Lund,
Asta Laiho,
Laura L. Elo

Affiliations

Johannes Smolander: Turku Bioscience Centre, University of Turku and Åbo Akademi University
Sofia Khan: Turku Bioscience Centre, University of Turku and Åbo Akademi University
Kalaimathy Singaravelu: Turku Bioscience Centre, University of Turku and Åbo Akademi University
Leni Kauko: Turku Bioscience Centre, University of Turku and Åbo Akademi University
Riikka J. Lund: Turku Bioscience Centre, University of Turku and Åbo Akademi University
Asta Laiho: Turku Bioscience Centre, University of Turku and Åbo Akademi University
Laura L. Elo: Turku Bioscience Centre, University of Turku and Åbo Akademi University

DOI: https://doi.org/10.1186/s12864-021-07686-z
Journal volume & issue: Vol. 22, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005–0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. Result Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs ( 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. Conclusions Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection.

Published in BMC Genomics

ISSN: 1471-2164 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Chemical technology: Biotechnology; Science: Biology (General): Genetics
Website: http://bmcgenomics.biomedcentral.com

About the journal

Abstract

Keywords