RumorLLM: A Rumor Large Language Model-Based Fake-News-Detection Data-Augmentation Approach

Jianqiao Lai; Xinran Yang; Wenyue Luo; Linjiang Zhou; Langchen Li; Yongqi Wang; Xiaochuan Shi

doi:10.3390/app14083532

Applied Sciences (Apr 2024)

RumorLLM: A Rumor Large Language Model-Based Fake-News-Detection Data-Augmentation Approach

Jianqiao Lai,
Xinran Yang,
Wenyue Luo,
Linjiang Zhou,
Langchen Li,
Yongqi Wang,
Xiaochuan Shi

Affiliations

Jianqiao Lai: School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
Xinran Yang: School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
Wenyue Luo: School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
Linjiang Zhou: School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
Langchen Li: School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
Yongqi Wang: School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China
Xiaochuan Shi: School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China

DOI: https://doi.org/10.3390/app14083532
Journal volume & issue: Vol. 14, no. 8
p. 3532

Abstract

Read online

With the rapid development of the Internet and social media, false information, rumors, and misleading content have become pervasive, posing significant threats to public opinion and social stability, and even causing serious societal harm. This paper introduces a novel solution to address the challenges of fake news detection, presenting the “Rumor Large Language Models” (RumorLLM), a large language model finetuned with rumor writing styles and content. The key contributions include the development of RumorLLM and a data-augmentation method for small categories, effectively mitigating the issue of category imbalance in real-world fake-news datasets. Experimental results on the BuzzFeed and PolitiFact datasets demonstrate the superiority of the proposed model over baseline methods, particularly in F1 score and AUC-ROC. The model’s robust performance highlights its effectiveness in handling imbalanced datasets and provides a promising solution to the pressing issue of false-information proliferation.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords