IEEE Access (Jan 2020)

Benefits of Bias in Crawl-Based Network Sampling for Identifying Key Node Set

  • Sho Tsugawa,
  • Hiroyuki Ohsaki

DOI
https://doi.org/10.1109/ACCESS.2020.2988910
Journal volume & issue
Vol. 8
pp. 75370 – 75380

Abstract

Read online

We study the problem of identifying a set of key nodes from a network when limited knowledge about its structure is available. Most studies assume complete knowledge of the given network when identifying a set of key nodes, but in current practice, networks of interest are often too huge to obtain their entire topological structures. When the complete structure of a network is not available, network sampling strategies are often used to obtain a partial structure of the network. We investigate how network sampling strategies affect the problem of identifying a key node set. Specifically, we investigate the effect of conventional network sampling strategies on the solutions found for two types of key node set identification problems: the minimum $p$ -median problem and the influence maximization problem. Our results show that when the network is obtained using crawl-based network sampling strategies, both the minimum $p$ -median and the influence maximization problems are effectively solved by simple heuristic algorithms with sampling ratios in the 10-20% range. We also find that among three conventional sampling strategies (random sampling, random walk sampling, and sample edge counts) checked in this paper, random walk sampling is the most robust strategy in terms of effectively identifying the key node sets of diverse types of networks.

Keywords