Computational Ecology and Software (Jun 2012)

Permutation tests to estimate significances on Principal Components Analysis

  • Vasco M. N. C. S. Vieira

Journal volume & issue
Vol. 2, no. 2
pp. 103 – 123

Abstract

Read online

Principal Component Analysis is the most widely used multivariate technique to summarize information in a data collection with many variables. However, for it to be valid and useful the meaningful information must be retained and the noisy information must be sorted out. To achieve it an index from the original data set isestimated, after which three classes of methodologies may be used: (i) the analytical solution to the distribution of the index under the assumption the data has a multivariate normal distribution, (ii) the numerical solution to the distribution of the index by means of permutation tests without any assumption about the data distributionand (iii) the bootstrap numerical solution to the percentiles of the index and the comparison to its assumed value for the null hypothesis without any assumption about the data distribution. New indices are proposed to be used with permutation tests and compared with previous ones from application to several data sets. Theiradvantages and draw-backs are discussed together with the adequacy of permutation tests and inadequacy of both bootstrap techniques and methods that rely on the assumption of multivariate normal distributions.

Keywords