Frontiers in Genetics (Jun 2016)

Random Projection for fast and efficient multivariate correlation analysis of high-dimensional data: A new approach

  • Claudia eGrellmann,
  • Claudia eGrellmann,
  • Jane eNeumann,
  • Jane eNeumann,
  • Jane eNeumann,
  • Sebastian eBitzer,
  • Sebastian eBitzer,
  • Peter eKovacs,
  • Anke eTönjes,
  • Lars Tjelta Westlye,
  • Lars Tjelta Westlye,
  • Ole Andreas Andreassen,
  • Michael eStumvoll,
  • Michael eStumvoll,
  • Arno eVillringer,
  • Arno eVillringer,
  • Arno eVillringer,
  • Arno eVillringer,
  • Annette eHorstmann,
  • Annette eHorstmann,
  • Annette eHorstmann

DOI
https://doi.org/10.3389/fgene.2016.00102
Journal volume & issue
Vol. 7

Abstract

Read online

In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally extensive in applications involving large numbers of variables, as required, for example, in functional genomics. To handle high-dimensional problems, dimension reduction might be implemented as pre-processing step. We propose a new approach that incorporates Random Projection (RP) for dimensionality reduction into Partial Least Squares Correlation to efficiently solve high-dimensional multimodal problems like genotype-phenotype associations.We name our new method PLSC-RP. Using simulated and experimental data sets containing whole genome SNP measures as genotypes and whole brain neuroimaging measures as phenotypes, we demonstrate that PLSC-RP is drastically faster than traditional PLSC while providing statistically equivalent results. We also provide evidence that dimensionality reduction using RP is data type independent. Therefore, PLSC-RP opens up a wide range of possible applications. It can be used for any integrative analysis that combines information from multiple sources.

Keywords