Data & Policy (Jan 2024)
Inferring social networks from unstructured text data: A proof of concept detection of hidden communities of interest
Abstract
Social network analysis is known to provide a wealth of insights relevant to many aspects of policymaking. Yet, the social data needed to construct social networks are not always available. Furthermore, even when they are, interpreting such networks often relies on extraneous knowledge. Here, we propose an approach to infer social networks directly from the texts produced by actors and the terminological similarities that these texts exhibit. This approach relies on fitting a topic model to the texts produced by these actors and measuring topic profile correlations between actors. This reveals what can be called “hidden communities of interest,” that is, groups of actors sharing similar semantic contents but whose social relationships with one another may be unknown or underlying. Network interpretation follows from the topic model. Diachronic perspectives can also be built by modeling the networks over different time periods and mapping genealogical relationships between communities. As a case study, the approach is deployed over a working corpus of academic articles (domain of philosophy of science; N=16,917).
Keywords