IEEE Access (Jan 2018)

Hete_MESE: Multi-Dimensional Community Detection Algorithm Based on Multiplex Network Extraction and Seed Expansion for Heterogeneous Information Networks

  • Meilian Lu,
  • Zhihe Qu,
  • Ziheng Wang,
  • Zhenglin Zhang

DOI
https://doi.org/10.1109/ACCESS.2018.2883638
Journal volume & issue
Vol. 6
pp. 73965 – 73983

Abstract

Read online

Most real-world information networks are heterogeneous, which contain multiple types of entities and relations between the entities. Large-scale and heterogeneity are the typical properties of heterogeneous information networks, and their community structures are often overlapping, complex, and diverse. The existing community detection algorithms without considering the above-mentioned properties may lead to low accuracy or high time complexity of community detection. In this paper, we study the multi-dimensional heterogeneous community detection problem for large-scale heterogeneous information networks with general topology structures. We define the concept of node-centric community and propose a multi-dimensional community detection algorithm referred as Hete_MESE. Specifically, we first specify one of the multiple entity types in heterogeneous information networks as community centric node type and extract the multiplex networks accordingly, and then, overlapping node-centric communities are detected based on the multiplex networks, which are considered as the seed communities to absorb the other entity types to generate the heterogeneous communities utilizing seed expansion. Taking heterogeneous academic network as an example, the effects of Hete_MESE are evaluated through extensive experiments, and the features of detected multi-dimensional academic communities are analyzed. The results demonstrate that Hete_MESE can accurately and effectively detect the meaningful overlapping and heterogeneous communities in heterogeneous information networks with general topologies from multiple dimensions. Moreover, Hete_MESE has linear time complexity, so it can be applied to large-scale heterogeneous information networks.

Keywords