Horticulturae (Jan 2022)
C-CorA: A Cluster-Based Method for Correlation Analysis of RNA-Seq Data
Abstract
Correlation analysis is a routine method of biological data analysis. In the process of RNA-Seq analysis, differentially expressed genes could be identified by calculating the correlation coefficients in the comparison of gene expression vs. phenotype or gene expression vs. gene expression. However, due to the complicated genetic backgrounds of perennial fruit, the correlation coefficients between phenotypes and genes are usually not high in fruit quality studies. In this study, a cluster-based correlation analysis method (C-CorA) is presented for fruit RNA-Seq analysis. C-CorA is composed of two main parts: the clustering analysis and the correlation analysis. The algorithm is described and then integrated into the MATLAB code and the C# WPF project. The C-CorA method was applied to RNA-Seq datasets of loquat (Eriobotrya japonica) fruit stored or ripened under different conditions. Low temperature conditioning or heat treatment of loquat fruit can alleviate the extent of lignification that occurs because of postharvest storage under low temperatures (0 °C). The C-CorA method generated correlation coefficients and identified many candidate genes correlated with lignification, including EjCAD3 and EjCAD4 and transcription factors such as MYB (UN00328). C-CorA is an effective new method for the correlation analysis of various types of data with different dimensions and can be applied to RNA-Seq data for candidate gene detection in fruit quality studies.
Keywords