FastGFDs: Efficient Validation of Graph Functional Dependencies with Desbordante

George Chernishev; Anton Chernikov; Yurii Litvinov; Kirill Smirnov

doi:10.5281/zenodo.8005293

Proceedings of the XXth Conference of Open Innovations Association FRUCT (May 2023)

FastGFDs: Efficient Validation of Graph Functional Dependencies with Desbordante

George Chernishev,
Anton Chernikov,
Yurii Litvinov,
Kirill Smirnov

Affiliations

George Chernishev: Saint-Petersburg State University
Anton Chernikov: Saint-Petersburg State University
Yurii Litvinov: Saint-Petersburg State University
Kirill Smirnov: Saint-Petersburg State University

DOI: https://doi.org/10.5281/zenodo.8005293
Journal volume & issue: Vol. 33, no. 2
pp. 346 – 352

Abstract

Read online

Graph functional dependencies (GFD) are a recently-developed concept aimed at capturing both topological structures in graphs and functional dependencies between attributes. The process of verifying whether a given GFD holds over a particular graph is referred to as GFD validation. In this very computationally expensive problem, locating suitable subgraphs accounts for about 99\% of the total run time. The concept's authors originally proposed a parallel scheme (algorithm), targeting specifically clusters of high-performance servers. The goal of this study is to open GFD validation to a broader public by making it possible to run it on a consumer class PC. Our initial experiments demonstrated that the existing algorithm may not be optimal for these purposes. Therefore, we propose FastGFDs~--- a GFD validation algorithm that employs a recently developed graph matching technique. In contrast to the parallel scheme, it is sequential and operates on the entire graph. Its novelty lies in the use of Core-First Decomposition and the Compact Path Index (CPI). We compare it with the naive sequential algorithm and the parallel scheme, evaluating run times and memory consumption. The current study is the first step towards designing an efficient algorithm for GFD validation in low-end single-node environments. We also provide an open-source implementation of GFD validation over large data graphs. To the best of our knowledge, this is the only publicly available implementation of an algorithm for this problem. It is developed in Desbordante~--- an open-source high-performance data profiler aimed at science-intensive tasks. Finally, our experiments on a real-life graph demonstrated up to three times performance (2.6x on average) improvement over the parallel scheme. Employing the new subgraph matching algorithm also reduced memory consumption by five times.

data profiling graph functional dependencies data mining

Published in Proceedings of the XXth Conference of Open Innovations Association FRUCT

ISSN: 2305-7254 (Print); 2343-0737 (Online)
Publisher: FRUCT
Country of publisher: Finland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: http://fruct.org/publication

About the journal

Abstract

Keywords