Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study

Sheng-Feng Sung; Ya-Han Hu; Chong-Yan Chen

doi:10.2196/56955

JMIR Medical Informatics (Oct 2024)

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study

Sheng-Feng Sung,
Ya-Han Hu,
Chong-Yan Chen

Affiliations

Sheng-Feng Sung: ORCiD
Ya-Han Hu: ORCiD
Chong-Yan Chen: ORCiD

DOI: https://doi.org/10.2196/56955
Journal volume & issue: Vol. 12
pp. e56955 – e56955

Abstract

Read online

Abstract BackgroundElectronic medical records store extensive patient data and serve as a comprehensive repository, including textual medical records like surgical and imaging reports. Their utility in clinical decision support systems is substantial, but the widespread use of ambiguous and unstandardized abbreviations in clinical documents poses challenges for natural language processing in clinical decision support systems. Efficient abbreviation disambiguation methods are needed for effective information extraction. ObjectiveThis study aims to enhance the one-to-all (OTA) framework for clinical abbreviation expansion, which uses a single model to predict multiple abbreviation meanings. The objective is to improve OTA by developing context-candidate pairs and optimizing word embeddings in Bidirectional Encoder Representations From Transformers (BERT), evaluating the model’s efficacy in expanding clinical abbreviations using real data. MethodsThree datasets were used: Medical Subject Headings Word Sense Disambiguation, University of Minnesota, and Chia-Yi Christian Hospital from Ditmanson Medical Foundation Chia-Yi Christian Hospital. Texts containing polysemous abbreviations were preprocessed and formatted for BERT. The study involved fine-tuning pretrained models, ClinicalBERT and BlueBERT, generating dataset pairs for training and testing based on Huang et al’s method. ResultsBlueBERT achieved macro- and microaccuracies of 95.41% and 95.16%, respectively, on the Medical Subject Headings Word Sense Disambiguation dataset. It improved macroaccuracy by 0.54%‐1.53% compared to two baselines, long short-term memory and deepBioWSD with random embedding. On the University of Minnesota dataset, BlueBERT recorded macro- and microaccuracies of 98.40% and 98.22%, respectively. Against the baselines of Word2Vec + support vector machine and BioWordVec + support vector machine, BlueBERT demonstrated a macroaccuracy improvement of 2.61%‐4.13%. ConclusionsThis research preliminarily validated the effectiveness of the OTA method for abbreviation disambiguation in medical texts, demonstrating the potential to enhance both clinical staff efficiency and research effectiveness.

Published in JMIR Medical Informatics

ISSN: 2291-9694 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://medinform.jmir.org

About the journal