IEEE Access (Jan 2021)
A Label Propagation Based Node Clustering Algorithm in Heterogeneous Information Networks
Abstract
With the fast development of network technology, great amount of data have been accumulated. Plenty of them are organized by using heterogeneous information networks (HIN). So mining heterogeneous information networks efficiently is very important. Node clustering is an essential part of this task. And several clustering algorithms have been proposed. As all these algorithms contain complicated optimization procedure and matrix calculation procedure, complexity of these algorithms is very high. To overcome the shortage described above, in this paper, a new clustering algorithm is proposed. In this algorithm, several parameters should be inputted. These parameters include a heterogeneous information network, meta-paths that are used and the names of target types. During the clustering procedure, a homogeneous network will be built by the proposed algorithm firstly. All the target objects of HIN are treated as nodes of this network. The instances of meta-paths are edges. After the homogeneous network is constructed, label propagation procedure can be performed. Then the clustering result will be obtained. Obviously, by using the proposed algorithm to perform clustering, the complex optimization procedure and matrix calculation procedure are eliminated. As the convergence rate of label propagation procedure is fast, the proposed algorithm is very efficient. Besides, we can find that label propagation procedure can be executed in parallel. Thus, the proposed algorithm is easy to be parallelized. In this situation, it is fit for processing large scale HIN based on server cluster. From experimental results, we can find that the proposed algorithm running faster than all the other algorithms for comparison.
Keywords