How Genuine is Computer-generated News? — Evaluation of Automated Text Generation Applied to Economic News

Yuen-Hsien Tseng; Yu-Chi Lin

doi:10.6182/jlis.202106_19(1).043

Journal of Library and Information Studies (Jun 2021)

How Genuine is Computer-generated News? — Evaluation of Automated Text Generation Applied to Economic News

Yuen-Hsien Tseng ,
Yu-Chi Lin

Affiliations

Yuen-Hsien Tseng: ORCiD; Graduate Institute of Library and Information Studies, National Taiwan Normal University, Taipei, Taiwan
Yu-Chi Lin: Graduate Institute of Library and Information Studies, National Taiwan Normal University, Taipei, Taiwan

DOI: https://doi.org/10.6182/jlis.202106_19(1).043
Journal volume & issue: Vol. 19, no. 1
pp. 43 – 65

Abstract

Read online

This research explores the GPT-2 deep learning model for economic news generation and evaluation. After training GPT-2 by about 300,000 pieces of news with a total of 150 million words, 15 news articles are generated by GPT-2. Together with 15 real news articles written by journalists, 12 subjects were invited to judge the credibility of the 30 news articles with 1 to 5 scales. As a result, 8 subjects who graduated from economic-related major were more capable of discriminating the human- composed news (HCN) from the computer-generated news (CGN); while 4 subjects who graduated from non-economic related major had poor discriminating ability, and one was even unable to tell the HCN from the CGN. Among the 15 HCN articles, 1 was rated as non-genuine news, with an average credibility of 2.92, which is less than 3, due to lack of logic and strong subjectivity. Among the 15 CGN articles, 2 were rated as genuine news, with average credibility of 3.33, which is greater than 3, because the content is reasonable and the details are logical. After comparing these two articles with the corpus, it is found that the computer’s ability to substitute and retouch can deceive professionals. However, most of the CGN articles have been spotted, mainly because of obvious flaws in facts and incorrect digits such as dates and stock codes. The research also explores the possibility of automatically detecting computer-generated news using BERT-based neural network model. As a result, BERT had only 2 false predictions out of the above 30 news articles. Compared with the collective prediction by the 12 subjects with 5 errors, BERT performs better. Further large-scale experiments show that the effectiveness of BERT can reach an F-score of 0.96. (Article content in Chinese with English extended abstract)

Published in Journal of Library and Information Studies

ISSN: 1606-7509 (Print)
Publisher: National Taiwan University
Country of publisher: Taiwan, Province of China
LCC subjects: Bibliography. Library science. Information resources
Website: http://jlis.lis.ntu.edu.tw/

About the journal

Abstract

Keywords