Feature extractions and selection of bot detection on Twitter A systematic literature review

Raad Al-azawi; Safaa O.  AL-mamory

doi:10.4114/intartif.vol25iss69pp57-86

Inteligencia Artificial (Apr 2022)

Feature extractions and selection of bot detection on Twitter A systematic literature review

Raad Al-azawi,
Safaa O. AL-mamory

Affiliations

Raad Al-azawi: University of Babylon, Iraq
Safaa O. AL-mamory: College of Business Informatics, Bagdad, Iraq

DOI: https://doi.org/10.4114/intartif.vol25iss69pp57-86
Journal volume & issue: Vol. 25, no. 69

Abstract

Read online

Abstract Automated or semiautomated computer programs that imitate humans and/or human behavior in online social networks are known as social bots. Users can be attacked by social bots to achieve several hidden aims, such as spreading information or influencing targets. While researchers develop a variety of methods to detect social media bot accounts, attackers adapt their bots to avoid detection. This field necessitates ongoing growth, particularly in the areas of feature selection and extraction. The study's purpose is to provide an overview of bot attacks on Twitter, shedding light on issues in feature extraction and selection that have a significant impact on the accuracy of bot detection algorithms, and highlighting the weaknesses in training time and dimensionality reduction. To the best of our knowledge, this study is the first systematic literature review based on a preset search-strategy that encompasses literature published between 2018 and 2021 which are concerned with Twitter features (attributes). The key findings of this research are threefold. First, the paper provides an improved taxonomy of feature extraction and selection approaches. Second, it includes a comprehensive overview of approaches for detecting bots in the Twitter platform, particularly machine learning techniques. The percentage was calculated using the proposed taxonomy, with metadata, tweet text, and merging (meta and tweet text) accounting for 37%, 31%, and 32%, respectively. Third, some gaps are also highlighted for further research. The first is that public datasets are not precise or suitable in size. Second, the use of integrated systems and real-time detection is uncommon. Third, detecting each bots category identified separately is needed, rather than detecting all categories of bots using one generic model and the same features' values. Finally, extracting influential features that assist machine learning algorithms in detecting Twitter bots with high accuracy is critical, especially if the type of bot is pre-determined.

Published in Inteligencia Artificial

ISSN: 1137-3601 (Print); 1988-3064 (Online)
Publisher: Asociación Española para la Inteligencia Artificial
Country of publisher: Spain
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://journal.iberamia.org

About the journal

Abstract

Keywords