Journal of Information and Telecommunication (Apr 2020)

Path histogram distance and complete subtree histogram distance for rooted labelled caterpillars

  • Taiga Kawaguchi,
  • Takuya Yoshino,
  • Kouichi Hirata

DOI
https://doi.org/10.1080/24751839.2020.1718443
Journal volume & issue
Vol. 4, no. 2
pp. 199 – 212

Abstract

Read online

A rooted labelled caterpillar (a caterpillar, for short) is a rooted labelled unordered tree transformed to a path after removing all the leaves in it. In this paper, we discuss two histogram distance between caterpillars. One is a path histogram distance as an $L_1$-distance between the histograms of paths from the root to every leaf and another is a complete subtree histogram distance as an $L_1$-distance between the histograms of complete subtrees for every node. While the latter is always a metric for general trees, the former is not a metric. In this paper, we show that, for caterpillars, the path histogram distance is always a metric, simply linear-time computable and incomparable with the edit distance. Furthermore, we give experimental results for caterpillars in real data of comparing the path histogram distance and the complete subtree histogram distance with the isolated-subtree distance as the most general tractable variation of the edit distance.

Keywords