Fake News Classification Based on Content Level Features

Chun-Ming Lai; Mei-Hua Chen; Endah Kristiani; Vinod Kumar Verma; Chao-Tung Yang

doi:10.3390/app12031116

Applied Sciences (Jan 2022)

Fake News Classification Based on Content Level Features

Chun-Ming Lai,
Mei-Hua Chen,
Endah Kristiani,
Vinod Kumar Verma,
Chao-Tung Yang

Affiliations

Chun-Ming Lai: Department of Computer Science, Tunghai University, Taichung City 407224, Taiwan
Mei-Hua Chen: Department of Foreign Languages and Literature, Tunghai University, Taichung City 407224, Taiwan
Endah Kristiani: Department of Computer Science, Tunghai University, Taichung City 407224, Taiwan
Vinod Kumar Verma: Department of Computer Science & Engineering, Sant Longowal Institute of Engineering & Technology, (SLIET), Longowal 148106, India
Chao-Tung Yang: Department of Computer Science, Tunghai University, Taichung City 407224, Taiwan

DOI: https://doi.org/10.3390/app12031116
Journal volume & issue: Vol. 12, no. 3
p. 1116

Abstract

Read online

Due to the openness and easy accessibility of online social media (OSM), anyone can easily contribute a simple paragraph of text to express their opinion on an article that they have seen. Without access control mechanisms, it has been reported that there are many suspicious messages and accounts spreading across multiple platforms. Accordingly, identifying and labeling fake news is a demanding problem due to the massive amount of heterogeneous content. In essence, the functions of machine learning (ML) and natural language processing (NLP) are to enhance, speed up, and automate the analytical process. Therefore, this unstructured text can be transformed into meaningful data and insights. In this paper, the combination of ML and NLP are implemented to classify fake news based on an open, large and labeled corpus on Twitter. In this case, we compare several state-of-the-art ML and neural network models based on content-only features. To enhance classification performance, before the training process, the term frequency-inverse document frequency (TF-IDF) features were applied in ML training, while word embedding was utilized in neural network training. By implementing ML and NLP methods, all the traditional models have greater than 85% accuracy. All the neural network models have greater than 90% accuracy. From the experiments, we found that the neural network models outperform the traditional ML models by, on average, approximately 6% precision, with all neural network models reaching up to 90% accuracy.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords