IEEE Access (Jan 2020)

Flow-Based Clustering on Directed Graphs: A Structural Entropy Minimization Approach

  • Yuan Yao,
  • Yicheng Pan,
  • Shaojiang Wang,
  • Hongan Wang,
  • Waqas Nazeer

DOI
https://doi.org/10.1109/ACCESS.2019.2960725
Journal volume & issue
Vol. 8
pp. 152579 – 152591

Abstract

Read online

In this study, we focus on flow-based clustering on directed graphs and propose a localized algorithm for this problem. Flow-based clustering in networks requires a set of closely related vertices where the flow amount getting into it is larger than that going out of it. It is able to formulate a variety of practical problems, such as fund-raising set detection in financial networks and influential document clustering in citation networks, etc. Methodologically, we propose the new concept of two-dimensional structural entropy on directed graphs, and based on this, a local structural entropy minimization algorithm for detecting the flow-based community structure of networks is designed. We adopt our algorithm for the problem of fund-raising set detection in financial networks, in which vertices represent accounts, edges represent transactions between two accounts, and weights represent money amounts of transactions. In our experiments, the local two-dimensional structure entropy minimization algorithm is devoted to find a fund-raising community which involves a given input account. We conduct experiments on both synthetic and real fund-raising datasets. The experimental results demonstrate that, given a fixed account, our algorithm is able to efficiently locate a fund-raising community (if any) for which the fund flowing into the community is much higher than that flowing out, and the transactions within the community are relatively denser (fund amount based) than that of inter-community. For a synthetic ground-truth fund-raising community, we adjust the parameters to change its fund-raising tendency. The results for the synthetic datasets show that our algorithm obtains higher precision and recall rates as this tendency gets stronger with each single factor varing. For a real fund-raising community embedded in a simulated capital flow network, our algorithm also find it with high precision and recall rates. The experiments for both scenarios verify the effectiveness of our algorithm.

Keywords