Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Thaer Thaher; Mahmoud Saheb; Hamza Turabieh; Hamouda Chantar

doi:10.3390/sym13040556

Symmetry (Mar 2021)

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Thaer Thaher,
Mahmoud Saheb,
Hamza Turabieh,
Hamouda Chantar

Affiliations

Thaer Thaher: Department of Engineering and Technology Sciences, Arab American University, P.O. Box 240 Jenin, Palestine
Mahmoud Saheb: IT and Computer Engineering College, Palestine Polytechnic University, P.O. Box 198 Hebron, Palestine
Hamza Turabieh: Department of Information Technology, Collage of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
Hamouda Chantar: Faculty of Information Technology, Sebha University, Sebha 18758, Libya

DOI: https://doi.org/10.3390/sym13040556
Journal volume & issue: Vol. 13, no. 4
p. 556

Abstract

Read online

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords