Известия Иркутского государственного университета: Серия "Математика" (Sep 2018)

Graph Clustering Based on Modularity Variation Estimations

  • N. N. Martynov,
  • O. V. Khandarova,
  • F. V. Khandarov

DOI
https://doi.org/10.26516/1997-7670.2018.25.63
Journal volume & issue
Vol. 25, no. 1
pp. 63 – 78

Abstract

Read online

Graph clustering is one of the constantly actual data analysis problems. There are various statements of this problem. In this paper we consider the statement of search for a partition of a graph vertices set into disjoint subsets. In such a way, that the density of connections between the vertices of one subset was higher than that between the vertices of different subsets. There is a lot of various approaches, and many of them use such as an a posteriori estimate of clustering quality, as modularity. The modularity functional, taking the value from [-1, 1], allows us to formally evaluate the quality of partitioning into subsets. This paper deals with an approach, instead of calculating the modularity, using a less computationally complex estimate of modularity change in the operation of clusters union. Four theorems for different graph types are formulated, presenting the possibility of application of suggested estimate, instead of direct modularity calculations. A greedy algorithmic scheme and also AMVE (Algorithm based on Modularity Variation Estimation) --- simple greedy algorithm based on the scheme are proposed. An attempt of comparative analysis on the test problemet of AMVE with heuristic clustering algorithms implemented in actual data analysis software is described. It is shown the comparative advantage of AMVE, both in terms of speed and quality of clustering. Also, the cases on the use of developed algorithm and its implementation for data analysis in sociology and history and criticism of literature are mentioned. In these cases, investigated graphs based on social networks data (the ratio of "friendship" in the social network between users used as the graph edges). It is demonstrated a slight superiority of AMVE in clustering quality compared to the known algorithms Louvain and Walktrap.

Keywords