Algorithms (Dec 2020)

Re-Pair in Small Space

  • Dominik Köppl,
  • Tomohiro I,
  • Isamu Furuya,
  • Yoshimasa Takabatake,
  • Kensuke Sakai,
  • Keisuke Goto

DOI
https://doi.org/10.3390/a14010005
Journal volume & issue
Vol. 14, no. 1
p. 5

Abstract

Read online

Re-Pairis a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large-scale data sets. As a solution for this problem, we present, given a text of length n whose characters are drawn from an integer alphabet with size σ=nO(1), an O(min(n2,n2lglogτnlglglgn/logτn)) time algorithm computing Re-Pair with max((n/c)lgn,nlgτ)+O(lgn) bits of working space including the text space, where c≥1 is a fixed user-defined constant and τ is the sum of σ and the number of non-terminals. We give variants of our solution working in parallel or in the external memory model. Unfortunately, the algorithm seems not practical since a preliminary version already needs roughly one hour for computing Re-Pair on one megabyte of text.

Keywords