HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

Philippe Flajolet; Éric Fusy; Olivier Gandouet; Frédéric Meunier

doi:10.46298/dmtcs.3545

Discrete Mathematics & Theoretical Computer Science (Jan 2007)

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

Philippe Flajolet,
Éric Fusy,
Olivier Gandouet,
Frédéric Meunier

Affiliations

Philippe Flajolet: Algorithms
Éric Fusy: Algorithms
Olivier Gandouet: Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Frédéric Meunier: Algorithms

DOI: https://doi.org/10.46298/dmtcs.3545
Journal volume & issue: Vol. DMTCS Proceedings vol. AH,..., no. Proceedings

Abstract

Read online

This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, "short bytes''), HYPERLOGLOG performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about $1.04/\sqrt{m}$. This improves on the best previously known cardinality estimator, LOGLOG, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond $10^9$ with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.

Published in Discrete Mathematics & Theoretical Computer Science

ISSN: 1462-7264 (Print); 1365-8050 (Online)
Publisher: Discrete Mathematics & Theoretical Computer Science
Country of publisher: France
LCC subjects: Science: Mathematics
Website: https://dmtcs.episciences.org/

About the journal

Abstract

Keywords