IEEE Access (Jan 2020)

Fast Representative Sampling in Large-Scale Online Social Networks

  • Guangren Cai,
  • Gang Lu,
  • Junxia Guo,
  • Cheng Ling,
  • Ruiqi Li

DOI
https://doi.org/10.1109/ACCESS.2020.2989504
Journal volume & issue
Vol. 8
pp. 77106 – 77119

Abstract

Read online

Online social networks (OSNs) have become important platforms for efficiently connecting people and promoting information dissemination, which is of great importance to our social life and society. However, due to privacy concerns, and access limitations, it is difficult to obtain the whole network of OSNs for analysis, so it is critical to have a representative subgraph. Yet due to the same reasons, we are in lack of the original network as the ground truth which poses great challenges on evaluating sampling methods on the performance on unbiasedness, let alone representativeness. Thus uniform sampling (UNI) [Gjoka et al. 2010] was proposed to obtain an unbiased nodal property distribution as of the original network to evaluate the degree of bias of other methods. Yet UNI sampling suffers from its low efficiency, and the representativeness and connectivity of the obtained subgraph, which is formed by the sampled nodes and connections between them, are rarely studied. We propose an adaptive UNI sampling (adpUNI) method to overcome previously mentioned disadvantages of UNI by dividing the userID space into several intervals, whose sampling probability adaptively changes based on its target rate. By adding its neighbors of the targeted node into the sample set (adpUNI+N), we can further improve the performance on sampling efficiency and obtain a more connective and representative subgraph. When applied to Sina Weibo and Twitter, our methods over-perform other classical methods on sampling efficiency, and always have a better performance on connectivity and representativeness than UNI sampling. And we also find that an unbiased sample doesn't guarantee a more representative subgraph.

Keywords