Applications in Plant Sciences (Jul 2024)

nQuack: An R package for predicting ploidal level from sequence data using site‐based heterozygosity

  • Michelle L. Gaynor,
  • Jacob B. Landis,
  • Timothy K. O'Connor,
  • Robert G. Laport,
  • Jeff J. Doyle,
  • Douglas E. Soltis,
  • José Miguel Ponciano,
  • Pamela S. Soltis

DOI
https://doi.org/10.1002/aps3.11606
Journal volume & issue
Vol. 12, no. 4
pp. n/a – n/a

Abstract

Read online

Abstract Premise Traditional methods of ploidal‐level estimation are tedious; using DNA sequence data for cytotype estimation is an ideal alternative. Multiple statistical approaches to leverage sequence data for ploidy inference based on site‐based heterozygosity have been developed. However, these approaches may require high‐coverage sequence data, use inappropriate probability distributions, or have additional statistical shortcomings that limit inference abilities. We introduce nQuack, an open‐source R package that addresses the main shortcomings of current methods. Methods and Results nQuack performs model selection for improved ploidy predictions. Here, we implement expectation maximization algorithms with normal, beta, and beta‐binomial distributions. Using extensive computer simulations that account for variability in sequencing depth, as well as real data sets, we demonstrate the utility and limitations of nQuack. Conclusions Inferring ploidy based on site‐based heterozygosity alone is difficult. Even though nQuack is more accurate than similar methods, we suggest caution when relying on any site‐based heterozygosity method to infer ploidy.

Keywords