Analytics (May 2023)
Clustering Matrix Variate Longitudinal Count Data
Abstract
Matrix variate longitudinal discrete data can arise in transcriptomics studies when the data are collected for N genes at r conditions over t time points, and thus, each observation Yn for n=1,…,N can be written as an r×t matrix. When dealing with such data, the number of parameters in the model can be greatly reduced by considering the matrix variate structure. The components of the covariance matrix then also provide a meaningful interpretation. In this work, a mixture of matrix variate Poisson-log normal distributions is introduced for clustering longitudinal read counts from RNA-seq studies. To account for the longitudinal nature of the data, a modified Cholesky-decomposition is utilized for a component of the covariance structure. Furthermore, a parsimonious family of models is developed by imposing constraints on elements of these decompositions. The models are applied to both real and simulated data, and it is demonstrated that the proposed approach can recover the underlying cluster structure.
Keywords