SoftwareX (Jul 2020)

RandPro- A practical implementation of random projection-based feature extraction for high dimensional multivariate data analysis in R

  • R. Siddharth,
  • G. Aghila

Journal volume & issue
Vol. 12
p. 100629

Abstract

Read online

The performance of the high dimensional multivariate data analysis is seriously affected by the curse of dimensionality. Feature extraction acts as an important pre-processing step in data analysis process to avoid the curse of dimensionality. Random projection method is the most underrated feature extraction technique that performs extremely well in the case of high dimensional data analysis. This technique is known for its characteristics like data independent projection, simpler computation and distance preserving property. The Johnson–Lindenstrauss lemma is the idea behind random projection method. It states that the small set of points in the high dimensional space can be embedded into smaller subspace and also approximately preserves the distance with higher probability. This article describes a practical implementation of random projection method in the popular statistical programming language R and it is compared with the other similar implementations. The software package for random projection method has been uploaded in Comprehensive R Archive Network(CRAN) repository as RandPro and the code has been distributed in github. The RandPro package is tested with different types of data including text, image and sensor data. The result shows that the RandPro package preserves the pairwise distance between the data points in the corresponding low dimensional space for further processing.

Keywords