IEEE Access (Jan 2023)

CrediBot: Applying Bot Detection for Credibility Analysis on Twitter

  • Ana Aguilera,
  • Pamela Quinteros,
  • Irvin Dongo,
  • Yudith Cardinale

DOI
https://doi.org/10.1109/ACCESS.2023.3320687
Journal volume & issue
Vol. 11
pp. 108365 – 108385

Abstract

Read online

Nowadays, people and organizations use social networks for allowing and facilitating the transfer of information among groups that share similar interests. Due to the wide repertoire of users that these social platforms have and the amount of information generated within them, the presence of bots has become a relevant issue, both to facilitate the sharing of true information or to disseminate false information (fake news). In the second case, bots could manipulate political opinions, be perpetrators of identity or information theft, among other possible dangers that can cause when interacting on the platform. Thus, the identification of bots in social networks can become a useful practice to evaluate credibility or to detect fake news. In this work, we extend a previously proposed credibility model for Twitter, by incorporating bot detection. The original model calculates the credibility of tweets based on three measures: text, account/user, and social impact, using different filters to analyse text (SPAM, bad words, and good spelling) and account attributes (e.g., creation date, followers, following) to calculate account/user and social credibility scores. The extended model considers in the user credibility, the bot verification. Additionally, the extended credibility model is implemented in T-CREo, a framework for real time credibility analysis. For bot detection, some machine learning algorithms for supervised learning, such as AdaBoost, Bagging, Decision Tree, Logistic Regression, and Random Forest are trained and evaluated. Results show that the best algorithm is the Random Forest for its capacity of generalization with an accuracy and F1-score values over 97% both in English and Spanish. The evaluation of the bot detection functionality in the credibility analysis shows a performance of precision=1.0, recall=0.8462, F1-score=0.9167, and accuracy=0.92 for both English and Spanish models in our validation tests.

Keywords