PLoS ONE (Jan 2022)
A nonlinear correlation measure with applications to gene expression data
Abstract
Nonlinear correlation exists in many types of biomedical data. Several types of pairwise gene expression in humans and other organisms show nonlinear correlation across time, e.g., genes involved in human T helper (Th17) cells differentiation, which motivated this study. The proposed procedure, called Kernelized correlation (Kc), first transforms nonlinear data on the plane via a function (kernel, usually nonlinear) to a high-dimensional (Hilbert) space. Next, we plug the transformed data into a classical correlation coefficient, e.g., Pearson’s correlation coefficient (r), to yield a nonlinear correlation measure. The algorithm to compute Kc is developed and the R code is provided online. In three simulated nonlinear cases, when noise in data is moderate, Kc with the RBF kernel (Kc-RBF) outperforms Pearson’s r and the well-known distance correlation (dCor). However, when noise in data is low, Pearson’s r and dCor perform slightly better than (equivalently to) Kc-RBF in Case 1 and 3 (in Case 2); Kendall’s tau performs worse than the aforementioned measures in all cases. In Application 1 to discover genes involved in the early Th17 cell differentiation, Kc is shown to detect the nonlinear correlations of four genes with IL17A (a known marker gene), while dCor detects nonlinear correlations of two pairs, and DESeq fails in all these pairs. Next, Kc outperforms Pearson’s and dCor, in estimating the nonlinear correlation of negatively correlated gene pairs in yeast cell cycle regulation. In conclusion, Kc is a simple and competent procedure to measure pairwise nonlinear correlations.