Illusion of Truth: Analysing and Classifying COVID-19 Fake News in Brazilian Portuguese Language

Patricia Takako Endo; Guto Leoni Santos; Maria Eduarda de Lima Xavier; Gleyson Rhuan Nascimento Campos; Luciana Conceição de Lima; Ivanovitch Silva; Antonia Egli; Theo Lynn

doi:10.3390/bdcc6020036

Big Data and Cognitive Computing (Apr 2022)

Illusion of Truth: Analysing and Classifying COVID-19 Fake News in Brazilian Portuguese Language

Patricia Takako Endo,
Guto Leoni Santos,
Maria Eduarda de Lima Xavier,
Gleyson Rhuan Nascimento Campos,
Luciana Conceição de Lima,
Ivanovitch Silva,
Antonia Egli,
Theo Lynn

Affiliations

Patricia Takako Endo: Programa de Pós-Graduação em Engenharia da Computação, Universidade de Pernambuco, Recife 50720-001, Brazil
Guto Leoni Santos: Centro de Informática, Universidade Federal de Pernambuco, Recife 50740-560, Brazil
Maria Eduarda de Lima Xavier: Programa de Pós-Graduação em Engenharia da Computação, Universidade de Pernambuco, Recife 50720-001, Brazil
Gleyson Rhuan Nascimento Campos: Programa de Pós-Graduação em Engenharia da Computação, Universidade de Pernambuco, Recife 50720-001, Brazil
Luciana Conceição de Lima: Programa de Pós-Graduação em Demografia, Universidade Federal do Rio Grande do Norte, Natal 59078-970, Brazil
Ivanovitch Silva: Programa de Pós-Graduação em Engenharia Elétrica e de Computação, Universidade Federal do Rio Grande do Norte, Natal 59078-970, Brazil
Antonia Egli: Business School, Dublin City University, Collins Avenue, D09 Y5N0 Dublin, Ireland
Theo Lynn: Business School, Dublin City University, Collins Avenue, D09 Y5N0 Dublin, Ireland

DOI: https://doi.org/10.3390/bdcc6020036
Journal volume & issue: Vol. 6, no. 2
p. 36

Abstract

Read online

Public health interventions to counter the COVID-19 pandemic have accelerated and increased digital adoption and use of the Internet for sourcing health information. Unfortunately, there is evidence to suggest that it has also accelerated and increased the spread of false information relating to COVID-19. The consequences of misinformation, disinformation and misinterpretation of health information can interfere with attempts to curb the virus, delay or result in failure to seek or continue legitimate medical treatment and adherence to vaccination, as well as interfere with sound public health policy and attempts to disseminate public health messages. While there is a significant body of literature, datasets and tools to support countermeasures against the spread of false information online in resource-rich languages such as English and Chinese, there are few such resources to support Portuguese, and Brazilian Portuguese specifically. In this study, we explore the use of machine learning and deep learning techniques to identify fake news in online communications in the Brazilian Portuguese language relating to the COVID-19 pandemic. We build a dataset of 11,382 items comprising data from January 2020 to February 2021. Exploratory data analysis suggests that fake news about the COVID-19 vaccine was prevalent in Brazil, much of it related to government communications. To mitigate the adverse impact of fake news, we analyse the impact of machine learning to detect fake news based on stop words in communications. The results suggest that stop words improve the performance of the models when keeping them within the message. Random Forest was the machine learning model with the best results, achieving 97.91% of precision, while Bi-GRU was the best deep learning model with an F1 score of 94.03%.

Published in Big Data and Cognitive Computing

ISSN: 2504-2289 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/BDCC

About the journal

Abstract

Keywords