Efficient estimation of the cardinality of large data sets

Philippe Chassaing; Lucas Gerin

doi:10.46298/dmtcs.3492

Discrete Mathematics & Theoretical Computer Science (Jan 2006)

Efficient estimation of the cardinality of large data sets

Philippe Chassaing,
Lucas Gerin

Affiliations

Philippe Chassaing: Institut Élie Cartan de Nancy
Lucas Gerin: Institut Élie Cartan de Nancy

DOI: https://doi.org/10.46298/dmtcs.3492
Journal volume & issue: Vol. DMTCS Proceedings vol. AG,..., no. Proceedings

Abstract

Read online

Giroire has recently proposed an algorithm which returns the $\textit{approximate}$ number of distinct elements in a large sequence of words, under strong constraints coming from the analysis of large data bases. His estimation is based on statistical properties of uniform random variables in $[0,1]$. In this note we propose an optimal estimation, using Kullback information and estimation theory.

Published in Discrete Mathematics & Theoretical Computer Science

ISSN: 1462-7264 (Print); 1365-8050 (Online)
Publisher: Discrete Mathematics & Theoretical Computer Science
Country of publisher: France
LCC subjects: Science: Mathematics
Website: https://dmtcs.episciences.org/

About the journal

Abstract

Keywords