Nature Communications (Jan 2025)
Small variant benchmark from a complete assembly of X and Y chromosomes
- Justin Wagner,
- Nathan D. Olson,
- Jennifer McDaniel,
- Lindsay Harris,
- Brendan J. Pinto,
- David Jáspez,
- Adrián Muñoz-Barrera,
- Luis A. Rubio-Rodríguez,
- José M. Lorenzo-Salazar,
- Carlos Flores,
- Sayed Mohammad Ebrahim Sahraeian,
- Giuseppe Narzisi,
- Marta Byrska-Bishop,
- Uday S. Evani,
- Chunlin Xiao,
- Juniper A. Lake,
- Peter Fontana,
- Craig Greenberg,
- Donald Freed,
- Mohammed Faizal Eeman Mootor,
- Paul C. Boutros,
- Lisa Murray,
- Kishwar Shafin,
- Andrew Carroll,
- Fritz J. Sedlazeck,
- Melissa Wilson,
- Justin M. Zook
Affiliations
- Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr.
- Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr.
- Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr.
- Lindsay Harris
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr.
- Brendan J. Pinto
- Center for Evolution & Medicine and School of Life Sciences, Arizona State University, Tempe, AZ 85281 USA - Department of Zoology, Milwaukee Public Museum
- David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER)
- Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER)
- Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER)
- José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER)
- Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER)
- Sayed Mohammad Ebrahim Sahraeian
- Roche Sequencing Solutions
- Giuseppe Narzisi
- New York Genome Center
- Marta Byrska-Bishop
- New York Genome Center
- Uday S. Evani
- New York Genome Center
- Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Juniper A. Lake
- Pacific Biosciences
- Peter Fontana
- Information Technology Laboratory, National Institute of Standards and Technology, 100 Bureau Dr. Mailstop 8940
- Craig Greenberg
- Information Technology Laboratory, National Institute of Standards and Technology, 100 Bureau Dr. Mailstop 8940
- Donald Freed
- Sentieon Inc.
- Mohammed Faizal Eeman Mootor
- Department of Human Genetics, University of California Los Angeles
- Paul C. Boutros
- Department of Human Genetics, University of California Los Angeles
- Lisa Murray
- Illumina
- Kishwar Shafin
- Google Inc, 1600 Amphitheatre Pkwy
- Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy
- Fritz J. Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center
- Melissa Wilson
- Center for Evolution & Medicine and School of Life Sciences, Arizona State University
- Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr.
- DOI
- https://doi.org/10.1038/s41467-024-55710-z
- Journal volume & issue
-
Vol. 16,
no. 1
pp. 1 – 7
Abstract
Abstract The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking variants in long homopolymers and tandem repeats, complex gene conversions, copy number variable gene arrays, and human satellites.