Frontiers in Plant Science (Mar 2024)

LocoGSE, a sequence-based genome size estimator for plants

  • Pierre Guenzi-Tiberi,
  • Benjamin Istace,
  • Inger Greve Alsos,
  • The PhyloNorway Consortium,
  • Eric Coissac,
  • Sébastien Lavergne,
  • The PhyloAlps Consortium,
  • Jean-Marc Aury,
  • France Denoeud,
  • L.G. Alsos,
  • M.K. Føreid Merkel,
  • Y. Lammers,
  • E. Coissac,
  • C. Pouchon,
  • A. Alberti,
  • F. Denoeud,
  • P. Wincker

DOI
https://doi.org/10.3389/fpls.2024.1328966
Journal volume & issue
Vol. 15

Abstract

Read online

Extensive research has focused on exploring the range of genome sizes in eukaryotes, with a particular emphasis on land plants, where significant variability has been observed. Accurate estimation of genome size is essential for various research purposes, but existing sequence-based methods have limitations, particularly for low-coverage datasets. In this study, we introduce LocoGSE, a novel genome size estimator designed specifically for low-coverage datasets generated by genome skimming approaches. LocoGSE relies on mapping the reads on single copy consensus proteins without the need for a reference genome assembly. We calibrated LocoGSE using 430 low-coverage Angiosperm genome skimming datasets and compared its performance against other estimators. Our results demonstrate that LocoGSE accurately predicts monoploid genome size even at very low depth of coverage (<1X) and on highly heterozygous samples. Additionally, LocoGSE provides stable estimates across individuals with varying ploidy levels. LocoGSE fills a gap in sequence-based plant genome size estimation by offering a user-friendly and reliable tool that does not rely on high coverage or reference assemblies. We anticipate that LocoGSE will facilitate plant genome size analysis and contribute to evolutionary and ecological studies in the field. Furthermore, at the cost of an initial calibration, LocoGSE can be used in other lineages.

Keywords