A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models

Theresa Scharl; Bettina Grün

doi:10.1186/s12859-024-05717-6

BMC Bioinformatics (Mar 2024)

A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models

Theresa Scharl,
Bettina Grün

Affiliations

Theresa Scharl: Institute of Statistics, University of Natural Resources and Life Sciences
Bettina Grün: Institute for Statistics and Mathematics, Vienna University of Economics and Business

DOI: https://doi.org/10.1186/s12859-024-05717-6
Journal volume & issue: Vol. 25, no. 1
pp. 1 – 21

Abstract

Read online

Abstract RNA sequencing of time-course experiments results in three-way count data where the dimensions are the genes, the time points and the biological units. Clustering RNA-seq data allows to extract groups of co-expressed genes over time. After standardisation, the normalised counts of individual genes across time points and biological units have similar properties as compositional data. We propose the following procedure to suitably cluster three-way RNA-seq data: (1) pre-process the RNA-seq data by calculating the normalised expression profiles, (2) transform the data using the additive log ratio transform to map the composition in the D-part Aitchison simplex to a $$D-1$$ D - 1 -dimensional Euclidean vector, (3) cluster the transformed RNA-seq data using matrix-variate Gaussian mixture models and (4) assess the quality of the overall cluster solution and of individual clusters based on cluster separation in the transformed space using density-based silhouette information and on compactness of the cluster in the original space using cluster maps as a suitable visualisation. The proposed procedure is illustrated on RNA-seq data from fission yeast and results are also compared to an analogous two-way approach after flattening out the biological units.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords