Analysis of Microbiome Data in the Presence of Excess Zeros

Abhishek Kaul; Siddhartha Mandal; Ori Davidov; Shyamal D. Peddada

doi:10.3389/fmicb.2017.02114

Frontiers in Microbiology (Nov 2017)

Analysis of Microbiome Data in the Presence of Excess Zeros

Abhishek Kaul,
Siddhartha Mandal,
Ori Davidov,
Shyamal D. Peddada

Affiliations

Abhishek Kaul: Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences (NIH), Durham, NC, United States
Siddhartha Mandal: Public Health Foundation of India, Gurgaon, India
Ori Davidov: Department of Statistics, University of Haifa, Haifa, Israel
Shyamal D. Peddada: Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences (NIH), Durham, NC, United States

DOI: https://doi.org/10.3389/fmicb.2017.02114
Journal volume & issue: Vol. 8

Abstract

Read online

Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data.Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.

Published in Frontiers in Microbiology

ISSN: 1664-302X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Microbiology
Website: http://www.frontiersin.org/journals/microbiology

About the journal

Abstract

Keywords