Identifying Persian bots on Twitter;  which feature is more important: Account Information or Tweet Contents?

Mojtaba Mazoochi; Nasrin Asadi; Farzaneh Rahmani; Leila Rabiei

International Journal of Information and Communication Technology Research (Feb 2023)

Identifying Persian bots on Twitter; which feature is more important: Account Information or Tweet Contents?

Mojtaba Mazoochi,
Nasrin Asadi,
Farzaneh Rahmani,
Leila Rabiei

Affiliations

Mojtaba Mazoochi: Information Technology Research Faculty ICT Research Institute Tehran, Iran [email protected]
Nasrin Asadi: Information Technology Research Faculty ICT Research Institute Tehran, Iran [email protected]
Farzaneh Rahmani: Information Technology Research Faculty ICT Research Institute Tehran, Iran [email protected]
Leila Rabiei: Information Technology Research Faculty ICT Research Institute Tehran, Iran

Journal volume & issue: Vol. 15, no. 1
pp. 35 – 44

Abstract

Read online

The spread of internet and smartphones in recent years has led to the popularity and easy accessibility of social networks among users. Despite the benefits of these networks, such as ease of interpersonal communication and providing a space for free expression of opinions, they also provide the opportunity for destructive activities such as spreading false information or using fake accounts for fraud intentions. Fake accounts are mainly managed by bots. So, identifying bots and suspending them could very much help to increase the popularity and favorability of social networks. In this paper, we try to identify Persian bots on Twitter. This seems to be a challenging task in view of the problems pertinent to processing colloquial Persian. To this end, a set of features based on user account information and activity of users added to content features of tweets to classify users by several machine learning algorithms like Random Forest, Logistic Regression and SVM. The results of experiments on a dataset of Persian-language users show the proper performance of the proposed methods. It turns out that, achieving a balanced-accuracy of 93.86%, Random Forest is the most accurate classifier among those mentioned above.

Published in International Journal of Information and Communication Technology Research

ISSN: 2251-6107 (Print); 2783-4425 (Online)
Publisher: Iran Telecom Research Center
Country of publisher: Iran, Islamic Republic of
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://ijict.itrc.ac.ir/

About the journal

Abstract

Keywords