IEEE Access (Jan 2024)

The Landscape of Compressibility Measures for Two-Dimensional Data

  • Lorenzo Carfagna,
  • Giovanni Manzini

DOI
https://doi.org/10.1109/ACCESS.2024.3417621
Journal volume & issue
Vol. 12
pp. 87268 – 87283

Abstract

Read online

In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the $\gamma $ measure defined in terms of the smallest string attractor, and the $\delta $ measure defined in terms of the number of distinct substrings of the input string. Concretely, we introduce the two-dimensional measures $\gamma _{2D}$ and $\delta _{2D}$ , as natural generalizations of $\gamma $ and $\delta $ , and we initiate the study of their properties. Among other things, we prove that $\delta _{2D}$ is monotone and can be computed in linear time, and we show that, although it is still true that $\delta _{2D}\leq \gamma _{2D}$ , the gap between the two measures can be $\Omega (\sqrt {n})$ and therefore asymptotically larger than the gap between $\gamma $ and $\delta $ . To complete the scenario of two-dimensional compressibility measures, we introduce the measure $b_{2D}$ which generalizes to two dimensions the notion of optimal parsing. We prove that, somewhat surprisingly, the relationship between $b_{2D}$ and $\gamma _{2D}$ is significantly different than in the one-dimensional case. As an application of our results we provide the first analysis of the space usage of the two-dimensional block tree introduced in [Brisaboa et al., Two-dimensional block trees, The computer Journal, 2024]. Our analysis shows that the space usage can be bounded in terms of both $\gamma _{2D}$ and $\delta _{2D}$ . Finally, using insights from our analysis, we design the first linear time and space algorithm for constructing the two-dimensional block tree for arbitrary matrices.

Keywords