A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data

Anjali Silva; Steven J. Rothstein; Paul D. McNicholas; Sanjeena Subedi

doi:10.1186/s12859-019-2916-0

BMC Bioinformatics (Jul 2019)

A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data

Anjali Silva,
Steven J. Rothstein,
Paul D. McNicholas,
Sanjeena Subedi

Affiliations

Anjali Silva: Department of Mathematics and Statistics, University of Guelph
Steven J. Rothstein: Department of Molecular and Cellular Biology, University of Guelph
Paul D. McNicholas: Department of Mathematics and Statistics, McMaster University
Sanjeena Subedi: Department of Mathematical Sciences, Binghamton University

DOI: https://doi.org/10.1186/s12859-019-2916-0
Journal volume & issue: Vol. 20, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. Results A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. Conclusions The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords