Discrete Mathematics & Theoretical Computer Science (Jan 2006)

Efficient estimation of the cardinality of large data sets

  • Philippe Chassaing,
  • Lucas Gerin

DOI
https://doi.org/10.46298/dmtcs.3492
Journal volume & issue
Vol. DMTCS Proceedings vol. AG,..., no. Proceedings

Abstract

Read online

Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.

Keywords