IEEE Access (Jan 2019)
A Parallel Community Detection in Multi-Modal Social Network With Apache Spark
Abstract
A constrained latent space model (CLSM) infers community membership based on two modalities in multi-modal social network data: network topology and node attributes. In this paper, we extend our previous model, CLSM in two ways. First, we introduce the Spark implementation of CLSM for parallel computation by fitting the inference algorithm into the map-reduce framework. Second, we consider user reputation besides the network homophily, which also affects social interactions between users. We test CLSM and its extension on two real-world problems: understanding link and user attributes in the location-based social network, and a review-trust network. Our proposed models in Spark can be easily deployed on commercial cloud services, such as Google cloud or Amazon web service and find latent community membership in large-scale datasets. We perform extensive experiments on real-world datasets and show how CLSM extension improves our previous model. We also share meaningful insights we discovered with the datasets.
Keywords