Austrian Journal of Statistics (Feb 2020)

Fast Approximate Complete-data k-nearest-neighbor Estimation

  • Alejandro Murua,
  • Nicolas Wicker

DOI
https://doi.org/10.17713/ajs.v49i2.907
Journal volume & issue
Vol. 49, no. 2

Abstract

Read online

We introduce a fast method to estimate the complete-data set of k-nearest-neighbors.This is equivalent to finding an estimate of the k-nearest-neighbor graph of the data. The method relies on random normal projections. The k-nearest-neighbors are estimated by sorting points in a number of random lines. For very large datasets, the method is quasi-linear in the data size. As an application, we show that the intrinsic dimension of a manifold can be reliably estimated from the estimated set of k-nearest-neighbors in time about two orders of magnitude faster than when using the exact set of k-nearest-neighbors.