PLoS ONE (Jan 2012)
One plus one makes three (for social networks).
Abstract
Members of social network platforms often choose to reveal private information, and thus sacrifice some of their privacy, in exchange for the manifold opportunities and amenities offered by such platforms. In this article, we show that the seemingly innocuous combination of knowledge of confirmed contacts between members on the one hand and their email contacts to non-members on the other hand provides enough information to deduce a substantial proportion of relationships between non-members. Using machine learning we achieve an area under the (receiver operating characteristic) curve (AUC) of at least 0.85 for predicting whether two non-members known by the same member are connected or not, even for conservative estimates of the overall proportion of members, and the proportion of members disclosing their contacts.