Using machine learning to extract information and predict outcomes from reports of randomised trials of smoking cessation interventions in the Human Behaviour-Change Project [version 2; peer review: 2 approved, 1 approved with reservations]

Pol Mac Aonghusa; Alison J. Wright; Robert West; Janna Hastings; Yufang Hou; Alison O'Mara-Eves; Francesca Bonin; Martin Gleize; Susan Michie; Marie Johnston; James Thomas

Wellcome Open Research (Nov 2024)

Using machine learning to extract information and predict outcomes from reports of randomised trials of smoking cessation interventions in the Human Behaviour-Change Project [version 2; peer review: 2 approved, 1 approved with reservations]

Pol Mac Aonghusa,
Alison J. Wright,
Robert West,
Janna Hastings,
Yufang Hou,
Alison O'Mara-Eves,
Francesca Bonin,
Martin Gleize,
Susan Michie,
Marie Johnston,
James Thomas

Affiliations

Pol Mac Aonghusa: ORCiD; IBM Research Europe, Dublin, Ireland
Alison J. Wright: ORCiD; Institute of Pharmaceutical Science, King's College London, London, England, UK
Robert West: ORCiD; Research Department of Behavioural Science and Health, University College London, London, England, UK
Janna Hastings: ORCiD; Institute for Implementation Science in Health Care, Faculty of Medicine, University of Zurich, Zürich, Zurich, Switzerland
Yufang Hou: IBM Research Europe, Dublin, Ireland
Alison O'Mara-Eves: ORCiD; EPPI-Centre, Social Research Institute, University College London, London, England, UK
Francesca Bonin: IBM Research Europe, Dublin, Ireland
Martin Gleize: IBM Research Europe, Dublin, Ireland
Susan Michie: ORCiD; Centre for Behaviour Change, University College London, London, England, UK
Marie Johnston: ORCiD; Aberdeen Health Psychology Group, University of Aberdeen, Aberdeen, Scotland, UK
James Thomas: ORCiD; EPPI-Centre, Social Research Institute, University College London, London, England, UK

Journal volume & issue: Vol. 8

Abstract

Read online

Background Using reports of randomised trials of smoking cessation interventions as a test case, this study aimed to develop and evaluate machine learning (ML) algorithms for extracting information from study reports and predicting outcomes as part of the Human Behaviour-Change Project. It is the first of two linked papers, with the second paper reporting on further development of a prediction system. Methods Researchers manually annotated 70 items of information (‘entities’) in 512 reports of randomised trials of smoking cessation interventions covering intervention content and delivery, population, setting, outcome and study methodology using the Behaviour Change Intervention Ontology. These entities were used to train ML algorithms to extract the information automatically. The information extraction ML algorithm involved a named-entity recognition system using the ‘FLAIR’ framework. The manually annotated intervention, population, setting and study entities were used to develop a deep-learning algorithm using multiple layers of long-short-term-memory (LSTM) components to predict smoking cessation outcomes. Results The F1 evaluation score, derived from the false positive and false negative rates (range 0–1), for the information extraction algorithm averaged 0.42 across different types of entity (SD=0.22, range 0.05–0.88) compared with an average human annotator’s score of 0.75 (SD=0.15, range 0.38–1.00). The algorithm for assigning entities to study arms (e.g., intervention or control) was not successful. This initial ML outcome prediction algorithm did not outperform prediction based just on the mean outcome value or a linear regression model. Conclusions While some success was achieved in using ML to extract information from reports of randomised trials of smoking cessation interventions, we identified major challenges that could be addressed by greater standardisation in the way that studies are reported. Outcome prediction from smoking cessation studies may benefit from development of novel algorithms, e.g., using ontological information to inform ML (as reported in the linked paper 1 ).

Published in Wellcome Open Research

ISSN: 2398-502X (Online)
Publisher: Wellcome
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://wellcomeopenresearch.org/

About the journal

Abstract

Keywords