Annotation of epilepsy clinic letters for natural language processing

Beata Fonferko-Shadrach; Huw Strafford; Carys Jones; Russell A. Khan; Sharon Brown; Jenny Edwards; Jonathan Hawken; Luke E. Shrimpton; Catharine P. White; Robert Powell; Inder M. S. Sawhney; William O. Pickrell; Arron S. Lacey

doi:10.1186/s13326-024-00316-z

Journal of Biomedical Semantics (Sep 2024)

Annotation of epilepsy clinic letters for natural language processing

Beata Fonferko-Shadrach,
Huw Strafford,
Carys Jones,
Russell A. Khan,
Sharon Brown,
Jenny Edwards,
Jonathan Hawken,
Luke E. Shrimpton,
Catharine P. White,
Robert Powell,
Inder M. S. Sawhney,
William O. Pickrell,
Arron S. Lacey

Affiliations

Beata Fonferko-Shadrach: Swansea University Medical School, Swansea University
Huw Strafford: Swansea University Medical School, Swansea University
Carys Jones: Swansea University Medical School, Swansea University
Russell A. Khan: Swansea University Medical School, Swansea University
Sharon Brown: Neurology Department, Swansea Bay University Health Board
Jenny Edwards: Neurology Department, Swansea Bay University Health Board
Jonathan Hawken: Neurology Department, Swansea Bay University Health Board
Luke E. Shrimpton: Neurology Department, Swansea Bay University Health Board
Catharine P. White: Swansea University Medical School, Swansea University
Robert Powell: Swansea University Medical School, Swansea University
Inder M. S. Sawhney: Swansea University Medical School, Swansea University
William O. Pickrell: Swansea University Medical School, Swansea University
Arron S. Lacey: Swansea University Medical School, Swansea University

DOI: https://doi.org/10.1186/s13326-024-00316-z
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 5

Abstract

Read online

Abstract Background Natural language processing (NLP) is increasingly being used to extract structured information from unstructured text to assist clinical decision-making and aid healthcare research. The availability of expert-annotated documents for the development and validation of NLP applications is limited. We created synthetic clinical documents to address this, and to validate the Extraction of Epilepsy Clinical Text version 2 (ExECTv2) NLP pipeline. Methods We created 200 synthetic clinic letters based on hospital outpatient consultations with epilepsy specialists. The letters were double annotated by trained clinicians and researchers according to agreed guidelines. We used the annotation tool, Markup, with an epilepsy concept list based on the Unified Medical Language System ontology. All annotations were reviewed, and a gold standard set of annotations was agreed and used to validate the performance of ExECTv2. Results The overall inter-annotator agreement (IAA) between the two sets of annotations produced a per item F1 score of 0.73. Validating ExECTv2 using the gold standard gave an overall F1 score of 0.87 per item, and 0.90 per letter. Conclusion The synthetic letters, annotations, and annotation guidelines have been made freely available. To our knowledge, this is the first publicly available set of annotated epilepsy clinic letters and guidelines that can be used for NLP researchers with minimum epilepsy knowledge. The IAA results show that clinical text annotation tasks are difficult and require a gold standard to be arranged by researcher consensus. The results for ExECTv2, our automated epilepsy NLP pipeline, extracted detailed epilepsy information from unstructured epilepsy letters with more accuracy than human annotators, further confirming the utility of NLP for clinical and research applications.

Published in Journal of Biomedical Semantics

ISSN: 2041-1480 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://jbiomedsem.biomedcentral.com

About the journal

Abstract

Keywords