Modeling disagreement in automatic data labeling for semi-supervised learning in Clinical Natural Language Processing

Hongshu Liu; Nabeel Seedat; Julia Ive

doi:10.3389/frai.2024.1374162

Frontiers in Artificial Intelligence (Oct 2024)

Modeling disagreement in automatic data labeling for semi-supervised learning in Clinical Natural Language Processing

Hongshu Liu,
Nabeel Seedat,
Julia Ive

Affiliations

Hongshu Liu: Department of Computing, Imperial College London, London, United Kingdom
Nabeel Seedat: Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom
Julia Ive: School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom

DOI: https://doi.org/10.3389/frai.2024.1374162
Journal volume & issue: Vol. 7

Abstract

Read online

IntroductionComputational models providing accurate estimates of their uncertainty are crucial for risk management associated with decision-making in healthcare contexts. This is especially true since many state-of-the-art systems are trained using the data which have been labeled automatically (self-supervised mode) and tend to overfit.MethodsIn this study, we investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports. This problem remains understudied for Natural Language Processing in the healthcare domain.ResultsWe demonstrate that Gaussian Processes (GPs) provide superior performance in quantifying the risks of three uncertainty labels based on the negative log predictive probability (NLPP) evaluation metric and mean maximum predicted confidence levels (MMPCL), whilst retaining strong predictive performance.DiscussionOur conclusions highlight the utility of probabilistic models applied to “noisy” labels and that similar methods could provide utility for Natural Language Processing (NLP) based automated labeling tasks.

Published in Frontiers in Artificial Intelligence

ISSN: 2624-8212 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/artificial-intelligence#

About the journal

Abstract

Keywords