BioTechniques (Mar 2013)

Sequential projection pursuit principal component analysis – dealing with missing data associated with new -omics technologies

  • Bobbie-Jo M. Webb-Robertson,
  • Melissa M. Matzke,
  • Thomas O. Metz,
  • Jason E. McDermott,
  • Hyunjoo Walker,
  • Karin D. Rodland,
  • Joel G. Pounds,
  • Katrina M. Waters

DOI
https://doi.org/10.2144/000113978
Journal volume & issue
Vol. 54, no. 3
pp. 165 – 168

Abstract

Read online

Principal Component Analysis (PCA) is a common exploratory tool used to evaluate large complex data sets. The resulting lower-dimensional representations are often valuable for pattern visualization, clustering, or classification of the data. However, PCA cannot be applied directly to many -omics data sets generated by newer technologies such as label-free mass spectrometry due to large numbers of non-random missing values. Here we present a sequential projection pursuit PCA (sppPCA) method for defining principal components in the presence of missing data. Our results demonstrate that this approach generates robust and informative low-dimensional data representations compared to commonly used imputation approaches.

Keywords