IEEE Access (Jan 2020)
Optimizing Distance Computation in Distributed Graph Systems
Abstract
Given a large graph, such as a social network or a knowledge graph, a fundamental query is how to find the distance from a source vertex to another vertex in the graph. As real graphs become very large and many distributed graph systems, such as Pregel, Pregel+, Giraph, and GraphX, are proposed, how to employ distributed graph systems to process single-source distance queries should attract more attention. In this paper, we propose a landmark-based framework to optimize the distance computation over distributed graph systems. We also use a measure called set betweenness to select the optimal set of landmarks for distance computation. Although we can prove that selecting the optimal set of landmarks is NP-hard, we propose a heuristic distributed algorithm that can guarantee the approximation ratio. Experiments on large real graphs confirm the superiority of our methods.
Keywords