Genome Biology (Apr 2021)
Complete vertebrate mitogenomes reveal widespread repeats and gene duplications
- Giulio Formenti,
- Arang Rhie,
- Jennifer Balacco,
- Bettina Haase,
- Jacquelyn Mountcastle,
- Olivier Fedrigo,
- Samara Brown,
- Marco Rosario Capodiferro,
- Farooq O. Al-Ajli,
- Roberto Ambrosini,
- Peter Houde,
- Sergey Koren,
- Karen Oliver,
- Michelle Smith,
- Jason Skelton,
- Emma Betteridge,
- Jale Dolucan,
- Craig Corton,
- Iliana Bista,
- James Torrance,
- Alan Tracey,
- Jonathan Wood,
- Marcela Uliano-Silva,
- Kerstin Howe,
- Shane McCarthy,
- Sylke Winkler,
- Woori Kwak,
- Jonas Korlach,
- Arkarachai Fungtammasan,
- Daniel Fordham,
- Vania Costa,
- Simon Mayes,
- Matteo Chiara,
- David S. Horner,
- Eugene Myers,
- Richard Durbin,
- Alessandro Achilli,
- Edward L. Braun,
- Adam M. Phillippy,
- Erich D. Jarvis,
- The Vertebrate Genomes Project Consortium
Affiliations
- Giulio Formenti
- The Vertebrate Genome Lab, Rockefeller University
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health
- Jennifer Balacco
- The Vertebrate Genome Lab, Rockefeller University
- Bettina Haase
- The Vertebrate Genome Lab, Rockefeller University
- Jacquelyn Mountcastle
- The Vertebrate Genome Lab, Rockefeller University
- Olivier Fedrigo
- The Vertebrate Genome Lab, Rockefeller University
- Samara Brown
- Laboratory of Neurogenetics of Language, Rockefeller University
- Marco Rosario Capodiferro
- Department of Biology and Biotechnology “L. Spallanzani”, University of Pavia
- Farooq O. Al-Ajli
- Monash University Malaysia Genomics Facility, School of Science
- Roberto Ambrosini
- Department of Environmental Science and Policy, University of Milan
- Peter Houde
- Department of Biology, New Mexico State University
- Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health
- Karen Oliver
- Wellcome Sanger Institute
- Michelle Smith
- Wellcome Sanger Institute
- Jason Skelton
- Wellcome Sanger Institute
- Emma Betteridge
- Wellcome Sanger Institute
- Jale Dolucan
- Wellcome Sanger Institute
- Craig Corton
- Wellcome Sanger Institute
- Iliana Bista
- Wellcome Sanger Institute
- James Torrance
- Wellcome Sanger Institute
- Alan Tracey
- Wellcome Sanger Institute
- Jonathan Wood
- Wellcome Sanger Institute
- Marcela Uliano-Silva
- Wellcome Sanger Institute
- Kerstin Howe
- Wellcome Sanger Institute
- Shane McCarthy
- Wellcome Sanger Institute
- Sylke Winkler
- Max Planck Institute of Molecular Cell Biology & Genetics
- Woori Kwak
- Hoonygen
- Jonas Korlach
- Pacific Biosciences
- Arkarachai Fungtammasan
- DNAnexus Inc.
- Daniel Fordham
- Oxford Nanopore Technologies Ltd, Oxford Science Park
- Vania Costa
- Oxford Nanopore Technologies Ltd, Oxford Science Park
- Simon Mayes
- Oxford Nanopore Technologies Ltd, Oxford Science Park
- Matteo Chiara
- Department of Biosciences, University of Milan
- David S. Horner
- Department of Biosciences, University of Milan
- Eugene Myers
- Max Planck Institute of Molecular Cell Biology & Genetics
- Richard Durbin
- Wellcome Sanger Institute
- Alessandro Achilli
- Department of Biology and Biotechnology “L. Spallanzani”, University of Pavia
- Edward L. Braun
- Department of Biology, University of Florida
- Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health
- Erich D. Jarvis
- The Vertebrate Genome Lab, Rockefeller University
- The Vertebrate Genomes Project Consortium
- DOI
- https://doi.org/10.1186/s13059-021-02336-9
- Journal volume & issue
-
Vol. 22,
no. 1
pp. 1 – 22
Abstract
Abstract Background Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. Results As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Conclusions Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.
Keywords