Applied Network Science (Jun 2019)
Non-backtracking cycles: length spectrum theory and graph mining applications
Abstract
Abstract Graph distance and graph embedding are two fundamental tasks in graph mining. For graph distance, determining the structural dissimilarity between networks is an ill-defined problem, as there is no canonical way to compare two networks. Indeed, many of the existing approaches for network comparison differ in their heuristics, efficiency, interpretability, and theoretical soundness. Thus, having a notion of distance that is built on theoretically robust first principles and that is interpretable with respect to features ubiquitous in complex networks would allow for a meaningful comparison between different networks. For graph embedding, many of the popular methods are stochastic and depend on black-box models such as deep networks. Regardless of their high performance, this makes their results difficult to analyze which hinders their usefulness in the development of a coherent theory of complex networks. Here we rely on the theory of the length spectrum function from algebraic topology, and its relationship to the non-backtracking cycles of a graph, in order to introduce two new techniques: Non-Backtracking Spectral Distance (NBD) for measuring the distance between undirected, unweighted graphs, and Non-Backtracking Embedding Dimensions (NBED) for finding a graph embedding in low-dimensional space. Both techniques are interpretable in terms of features of complex networks such as presence of hubs, triangles, and communities. We showcase the ability of NBD to discriminate between networks in both real and synthetic data sets, as well as the potential of NBED to perform anomaly detection. By taking a topological interpretation of non-backtracking cycles, this work presents a novel application of topological data analysis to the study of complex networks.
Keywords