Computational and Structural Biotechnology Journal (Dec 2024)
Review and revamp of compositional data transformation: A new framework combining proportion conversion and contrast transformation
Abstract
Due to the development of next-generation sequencing technology and an increased appreciation of their role in modulating host immunity and their potential as therapeutic agents, the human microbiome has emerged as a key area of interest in various biological investigations of human health and disease. However, microbiome data present a number of statistical challenges not addressed by existing methods, such as the varying sequencing depth, the compositionality, and zero inflation. Solutions like scaling and transformation methods help to mitigate heterogeneity and release constraints, but often introduce biases and yield inconsistent results on the same data. To address these issues, we conduct a systematic review of compositional data transformation, with a particular focus on the connection and distinction of existing techniques. Additionally, we create a new framework that enables the development of new transformations by combining proportion conversion with contrast transformations. This framework includes well-known methods such as Additive Log Ratio (ALR) and Centered Log Ratio (CLR) as special cases. Using this framework, we develop two novel transformations—Centered Arcsine Contrast (CAC) and Additive Arcsine Contrast (AAC)—which show enhanced performance in scenarios with high zero-inflation. Moreover, our findings suggest that ALR and CLR transformations are more effective when zero values are less prevalent. This comprehensive review and the innovative framework provide microbiome researchers with a significant direction to enhance data transformation procedures and improve analytical outcomes.