PLoS ONE (Jan 2023)

Interrogating 1000 insect genomes for NUMTs: A risk assessment for estimates of species richness.

  • Paul D N Hebert,
  • Dan G Bock,
  • Sean W J Prosser

DOI
https://doi.org/10.1371/journal.pone.0286620
Journal volume & issue
Vol. 18, no. 6
p. e0286620

Abstract

Read online

The nuclear genomes of most animal species include NUMTs, segments of the mitogenome incorporated into their chromosomes. Although NUMT counts are known to vary greatly among species, there has been no comprehensive study of their frequency/attributes in the most diverse group of terrestrial organisms, insects. This study examines NUMTs derived from a 658 bp 5' segment of the cytochrome c oxidase I (COI) gene, the barcode region for the animal kingdom. This assessment is important because unrecognized NUMTs can elevate estimates of species richness obtained through DNA barcoding and derived approaches (eDNA, metabarcoding). This investigation detected nearly 10,000 COI NUMTs ≥ 100 bp in the genomes of 1,002 insect species (range = 0-443). Variation in nuclear genome size explained 56% of the mitogenome-wide variation in NUMT counts. Although insect orders with the largest genome sizes possessed the highest NUMT counts, there was considerable variation among their component lineages. Two thirds of COI NUMTs possessed an IPSC (indel and/or premature stop codon) allowing their recognition and exclusion from downstream analyses. The remainder can elevate species richness as they showed 10.1% mean divergence from their mitochondrial homologue. The extent of exposure to "ghost species" is strongly impacted by the target amplicon's length. NUMTs can raise apparent species richness by up to 22% when a 658 bp COI amplicon is examined versus a doubling of apparent richness when 150 bp amplicons are targeted. Given these impacts, metabarcoding and eDNA studies should target the longest possible amplicons while also avoiding use of 12S/16S rDNA as they triple NUMT exposure because IPSC screens cannot be employed.