IEEE Access (Jan 2020)
Data Set Synthesis Based on Known Correlations and Distributions for Expanded Social Graph Generation
Abstract
Nowadays, data created through the usage of different services are most commonly not available to the average researcher. Security and privacy have become a top concern, which has further restricted access to certain real-life data, especially holding true for social networks. This is why synthetic data generators have become a very important area of research, particularly synthetic social graph generators. However, even today, such generators mostly create graphs that contain just the information whether two nodes are connected. Fortunately, there is an existing conceptual solution for an expanded social graph generator that aims to generate synthetic graphs containing multiple weighted edges between nodes, thus showing various types of relationships among those nodes, all based on known real-life data characteristics. One of its proposed steps is the generation of necessary data according to provided distributions and correlations. This paper focuses on the generation of such data by adapting an existing iterative algorithm for non-normal multivariate data simulation to generate synthetic data based on the publicly available distributions and correlations of Facebook interaction parameters. It is shown that the characteristics of the generated synthetic data are similar to the known characteristics of the real-life data, proving that the chosen algorithm, along with the accompanying alterations, can be used as one of the steps within the process of generating a synthetic expanded social graph.
Keywords