PeerJ (Mar 2020)

Probabilistic model based on circular statistics for quantifying coverage depth dynamics originating from DNA replication

  • Shinya Suzuki,
  • Takuji Yamada

DOI
https://doi.org/10.7717/peerj.8722
Journal volume & issue
Vol. 8
p. e8722

Abstract

Read online Read online

Background With the development of DNA sequencing technology, static omics profiling in microbial communities, such as taxonomic and functional gene composition determination, has become possible. Additionally, the recently proposed in situ growth rate estimation method allows the applicable range of current comparative metagenomics to be extended to dynamic profiling. However, with this method, the applicable target range is presently limited. Furthermore, the characteristics of coverage depth during replication have not been sufficiently investigated. Results We developed a probabilistic model that mimics coverage depth dynamics. This statistical model explains the bias that occurs in the coverage depth due to DNA replication and errors that arise from coverage depth observation. Although our method requires a complete genome sequence, it involves a stable to low coverage depth (>0.01×). We also evaluated the estimation using real whole-genome sequence datasets and reproduced the growth dynamics observed in previous studies. By utilizing a circular distribution in the model, our method facilitates the quantification of unmeasured coverage depth features, including peakedness, skewness, and degree of density, around the replication origin. When we applied the model to time-series culture samples, the skewness parameter, which indicates the asymmetry, was stable over time; however, the peakedness and degree of density parameters, which indicate the concentration level at the replication origin, changed dynamically. Furthermore, we demonstrated the activity measurement of multiple replication origins in a single chromosome. Conclusions We devised a novel framework for quantifying coverage depth dynamics. Our study is expected to serve as a basis for replication activity estimation from a broader perspective using the statistical model.

Keywords