Detecting bots in social-networks using node and structural embeddings

Ashkan Dehghan; Kinga Siuta; Agata Skorupka; Akshat Dubey; Andrei Betlen; David Miller; Wei Xu; Bogumił Kamiński; Paweł Prałat

doi:10.1186/s40537-023-00796-3

Journal of Big Data (Jul 2023)

Detecting bots in social-networks using node and structural embeddings

Ashkan Dehghan,
Kinga Siuta,
Agata Skorupka,
Akshat Dubey,
Andrei Betlen,
David Miller,
Wei Xu,
Bogumił Kamiński,
Paweł Prałat

Affiliations

Ashkan Dehghan: Toronto Metropolitan University
Kinga Siuta: Toronto Metropolitan University
Agata Skorupka: Toronto Metropolitan University
Akshat Dubey: Toronto Metropolitan University
Andrei Betlen: Patagona Technologies
David Miller: Patagona Technologies
Wei Xu: Toronto Metropolitan University
Bogumił Kamiński: SGH Warsaw School of Economics
Paweł Prałat: Toronto Metropolitan University

DOI: https://doi.org/10.1186/s40537-023-00796-3
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 37

Abstract

Read online

Abstract Users on social networks such as Twitter interact with each other without much knowledge of the real-identity behind the accounts they interact with. This anonymity has created a perfect environment for bot accounts to influence the network by mimicking real-user behaviour. Although not all bot accounts have malicious intent, identifying bot accounts in general is an important and difficult task. In the literature there are three distinct types of feature sets one could use for building machine learning models for classifying bot accounts. These feature-sets are: user profile metadata, natural language features (NLP) extracted from user tweets and finally features extracted from the the underlying social network. Profile metadata and NLP features are typically explored in detail in the bot-detection literature. At the same time less attention has been given to the predictive power of features that can be extracted from the underlying network structure. To fill this gap we explore and compare two classes of embedding algorithms that can be used to take advantage of information that network structure provides. The first class are classical embedding techniques, which focus on learning proximity information. The second class are structural embedding algorithms, which capture the local structure of node neighbourhood. We show that features created using structural embeddings have higher predictive power when it comes to bot detection. This supports the hypothesis that the local social network formed around bot accounts on Twitter contains valuable information that can be used to identify bot accounts.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords