Applied Sciences (Mar 2023)

SEBD: A Stream Evolving Bot Detection Framework with Application of PAC Learning Approach to Maintain Accuracy and Confidence Levels

  • Eiman Alothali,
  • Kadhim Hayawi,
  • Hany Alashwal

DOI
https://doi.org/10.3390/app13074443
Journal volume & issue
Vol. 13, no. 7
p. 4443

Abstract

Read online

A simple supervised learning model can predict a class from trained data based on the previous learning process. Trust in such a model can be gained through evaluation measures that ensure fewer misclassification errors in prediction results for different classes. This can be applied to supervised learning using a well-trained dataset that covers different data points and has no imbalance issues. This task is challenging when it integrates a semi-supervised learning approach with a dynamic data stream, such as social network data. In this paper, we propose a stream-based evolving bot detection (SEBD) framework for Twitter that uses a deep graph neural network. Our SEBD framework was designed based on multi-view graph attention networks using fellowship links and profile features. It integrates Apache Kafka to enable the Twitter API stream and predict the account type after processing. We used a probably approximately correct (PAC) learning framework to evaluate SEBD’s results. Our objective was to maintain the accuracy and confidence levels of our framework to enable successful learning with low misclassification errors. We assessed our framework results via cross-domain evaluation using test holdout, machine learning classifiers, benchmark data, and a baseline tool. The overall results show that SEBD is able to successfully identify bot accounts in a stream-based manner. Using holdout and cross-validation with a random forest classifier, SEBD achieved an accuracy score of 0.97 and an AUC score of 0.98. Our results indicate that bot accounts participate highly in hashtags on Twitter.

Keywords