Natural language processing for automated triage and prioritization of individual case safety reports for case-by-case assessment

Thomas Lieber; Helen R. Gosselt; Pelle C. Kools; Okko C. Kruijssen; Stijn N. C. Van Lierop; Linda Härmark; Florence P. A. M. Van Hunsel

doi:10.3389/fdsfr.2023.1120135

Frontiers in Drug Safety and Regulation (Feb 2023)

Natural language processing for automated triage and prioritization of individual case safety reports for case-by-case assessment

Thomas Lieber,
Helen R. Gosselt,
Pelle C. Kools,
Okko C. Kruijssen,
Stijn N. C. Van Lierop,
Linda Härmark,
Florence P. A. M. Van Hunsel

Affiliations

Thomas Lieber: Netherlands Pharmacovigilance Centre Lareb, 's-Hertogenbosch, Netherlands
Helen R. Gosselt: Netherlands Pharmacovigilance Centre Lareb, 's-Hertogenbosch, Netherlands
Pelle C. Kools: Faculty of Social Sciences, Radboud Universiteit, Nijmegen, Netherlands
Okko C. Kruijssen: Faculty of Social Sciences, Radboud Universiteit, Nijmegen, Netherlands
Stijn N. C. Van Lierop: Faculty of Social Sciences, Radboud Universiteit, Nijmegen, Netherlands
Linda Härmark: Netherlands Pharmacovigilance Centre Lareb, 's-Hertogenbosch, Netherlands
Florence P. A. M. Van Hunsel: Netherlands Pharmacovigilance Centre Lareb, 's-Hertogenbosch, Netherlands

DOI: https://doi.org/10.3389/fdsfr.2023.1120135
Journal volume & issue: Vol. 3

Abstract

Read online

Objective: To improve a previously developed prediction model that could assist in the triage of individual case safety reports using the addition of features designed from free text fields using natural language processing.Methods: Structured features and natural language processing (NLP) features were used to train a bagging classifier model. NLP features were extracted from free text fields. A bag-of-words model was applied. Stop words were deleted and words that were significantly differently distributed among the case and non-case reports were used for the training data. Besides NLP features from free-text fields, the data also consisted of a list of signal words deemed important by expert report assessors. Lastly, variables with multiple categories were transformed to numerical variables using the weight of evidence method.Results: the model, a bagging classifier of decision trees had an AUC of 0.921 (95% CI = 0.918–0.925). Generic drug name, info text length, ATC code, BMI and patient age. were most important features in classification.Conclusion: this predictive model using Natural Language Processing could be used to assist assessors in prioritizing which future ICSRs to assess first, based on the probability that it is a case which requires clinical review.

Published in Frontiers in Drug Safety and Regulation

ISSN: 2674-0869 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Therapeutics. Pharmacology
Website: https://www.frontiersin.org/journals/drug-safety-and-regulation/

About the journal

Abstract

Keywords