Viruses (Apr 2024)

Virus Quasispecies Rarefaction: Subsampling with or without Replacement?

  • Josep Gregori,
  • Marta Ibañez-Lligoña,
  • Sergi Colomer-Castell,
  • Carolina Campos,
  • Josep Quer

DOI
https://doi.org/10.3390/v16050710
Journal volume & issue
Vol. 16, no. 5
p. 710

Abstract

Read online

In quasispecies diversity studies, the comparison of two samples of varying sizes is a common necessity. However, the sensitivity of certain diversity indices to sample size variations poses a challenge. To address this issue, rarefaction emerges as a crucial tool, serving to normalize and create fairly comparable samples. This study emphasizes the imperative nature of sample size normalization in quasispecies diversity studies using next-generation sequencing (NGS) data. We present a thorough examination of resampling schemes using various simple hypothetical cases of quasispecies showing different quasispecies structures in the sense of haplotype genomic composition, offering a comprehensive understanding of their implications in general cases. Despite the big numbers implied in this sort of study, often involving coverages exceeding 100,000 reads per sample and amplicon, the rarefaction process for normalization should be performed with repeated resampling without replacement, especially when rare haplotypes constitute a significant fraction of interest. However, it is noteworthy that different diversity indices exhibit distinct sensitivities to sample size. Consequently, some diversity indicators may be compared directly without normalization, or instead may be resampled safely with replacement.

Keywords