IEEE Access (Jan 2020)

An Out of Memory tSVD for Big-Data Factorization

  • Hector Carrillo-Cabada,
  • Erik Skau,
  • Gopinath Chennupati,
  • Boian Alexandrov,
  • Hristo Djidjev

DOI
https://doi.org/10.1109/ACCESS.2020.3000508
Journal volume & issue
Vol. 8
pp. 107749 – 107759

Abstract

Read online

Singular value decomposition (SVD) is a matrix factorization method widely used for dimension reduction, data analytics, information retrieval, and unsupervised learning. In general, only singular values of SVD are needed for most big-data applications. Methods such as tensor networks require an accurate computation of a substantial number of singular vectors, which can be accomplished through truncated SVD (tSVD). Additionally, many real-world datasets are too big to fit into the available memory, which mandates the development of out of memory algorithms that assume that most of the data resides on an external disk during the entire computation. These algorithms reduce communication to disk and hide part of the communication by overlapping it with communication on blocks of work. Here, building upon previous works on SVD for dense matrices, we present a method for computation of a predetermined number, K, of SVD singular vectors, and the corresponding K singular values, of a matrix that cannot fit in the memory. Our out of memory tSVD can be used for tensor networks algorithms. We describe ways for reducing the communication during the computation of the left and right reflectors, needed to compute the singular vectors, and introduce a method for estimating the block-sizes needed to hide the communication on parallel file systems.

Keywords