BMC Bioinformatics (Sep 2021)

A globally diverse reference alignment and panel for imputation of mitochondrial DNA variants

  • Tim W. McInerney,
  • Brian Fulton-Howard,
  • Christopher Patterson,
  • Devashi Paliwal,
  • Lars S. Jermiin,
  • Hardip R. Patel,
  • Judy Pa,
  • Russell H. Swerdlow,
  • Alison Goate,
  • Simon Easteal,
  • Shea J. Andrews,
  • for the Alzheimer’s Disease Neuroimaging Initiative

DOI
https://doi.org/10.1186/s12859-021-04337-8
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Variation in mitochondrial DNA (mtDNA) identified by genotyping microarrays or by sequencing only the hypervariable regions of the genome may be insufficient to reliably assign mitochondrial genomes to phylogenetic lineages or haplogroups. This lack of resolution can limit functional and clinical interpretation of a substantial body of existing mtDNA data. To address this limitation, we developed and evaluated a large, curated reference alignment of complete mtDNA sequences as part of a pipeline for imputing missing mtDNA single nucleotide variants (mtSNVs). We call our reference alignment and pipeline MitoImpute. Results We aligned the sequences of 36,960 complete human mitochondrial genomes downloaded from GenBank, filtered and controlled for quality. These sequences were reformatted for use in imputation software, IMPUTE2. We assessed the imputation accuracy of MitoImpute by measuring haplogroup and genotype concordance in data from the 1000 Genomes Project and the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The mean improvement of haplogroup assignment in the 1000 Genomes samples was 42.7% (Matthew’s correlation coefficient = 0.64). In the ADNI cohort, we imputed missing single nucleotide variants. Conclusion These results show that our reference alignment and panel can be used to impute missing mtSNVs in existing data obtained from using microarrays, thereby broadening the scope of functional and clinical investigation of mtDNA. This improvement may be particularly useful in studies where participants have been recruited over time and mtDNA data obtained using different methods, enabling better integration of early data collected using less accurate methods with more recent sequence data.

Keywords