Automatic Identification of Information Quality Metrics in Health News Stories

Majed Al-Jefri; Majed Al-Jefri; Roger Evans; Joon Lee; Joon Lee; Joon Lee; Pietro Ghezzi

doi:10.3389/fpubh.2020.515347

Frontiers in Public Health (Dec 2020)

Automatic Identification of Information Quality Metrics in Health News Stories

Majed Al-Jefri,
Majed Al-Jefri,
Roger Evans,
Joon Lee,
Joon Lee,
Joon Lee,
Pietro Ghezzi

Affiliations

Majed Al-Jefri: Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Majed Al-Jefri: Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Roger Evans: School of Computing, Engineering and Mathematics, University of Brighton, Brighton, United Kingdom
Joon Lee: Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Joon Lee: Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Joon Lee: Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Pietro Ghezzi: Brighton & Sussex Medical School, Falmer, Brighton, United Kingdom

DOI: https://doi.org/10.3389/fpubh.2020.515347
Journal volume & issue: Vol. 8

Abstract

Read online

Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning.Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT.Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F1 measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process.Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment.

Published in Frontiers in Public Health

ISSN: 2296-2565 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Public aspects of medicine
Website: https://www.frontiersin.org/journals/public-health

About the journal

Abstract

Keywords