Compositional uncertainty should not be ignored in high-throughput sequencing data analysis

Gregory Brian Gloor; Jean M. Macklaim; Michael Vu; Andrew D. Fernandes

doi:10.17713/ajs.v45i4.122

Austrian Journal of Statistics (Jul 2016)

Compositional uncertainty should not be ignored in high-throughput sequencing data analysis

Gregory Brian Gloor,
Jean M. Macklaim,
Michael Vu,
Andrew D. Fernandes

Affiliations

Gregory Brian Gloor: The University of Western Ontario
Jean M. Macklaim: Department of Biochemistry The University of Western Ontario
Michael Vu: Department of Biochemistry The University of Western Ontario
Andrew D. Fernandes: Department of Applied Mathematics London, Canada

DOI: https://doi.org/10.17713/ajs.v45i4.122
Journal volume & issue: Vol. 45, no. 4

Abstract

Read online

High throughput sequencing generates sparse compositional data, yet these datasets are rarely analyzed using a compositional approach. In addition, the variation inherent in these datasets is rarely acknowledged, but ignoring it can result in many false positive inferences. We demonstrate that examination of point estimates of the data can result in false positive results, even with appropriate zero replacement approaches, using an in vitro selection dataset with an outside standard of truth. The variation inherent in real high-throughput sequencing datasets is demonstrated, and we show that this varia- tion can be approximated, and hence accounted for, by Monte-Carlo sampling from the Dirichlet distribution. This approximation when used by itself is itself problematic, but becomes useful when coupled with a log-ratio approach commonly used in compositional data analysis. Thus, the approach illustrated here that merges Bayesian estimation with principles of compositional data analysis should be generally useful for high-dimensional count compositional data of the type generated by high throughput sequencing.

Published in Austrian Journal of Statistics

ISSN: 1026-597X (Print)
Publisher: Austrian Statistical Society
Country of publisher: Austria
LCC subjects: Science: Mathematics: Probabilities. Mathematical statistics; Social Sciences: Statistics
Website: http://www.ajs.or.at

About the journal