Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches

Wanchana Ponthongmak; Ratchainant Thammasudjarit; Gareth J McKay; John Attia; Nawanan Theera-Ampornpunt; Ammarin Thakkinstian

Informatics in Medicine Unlocked (Jan 2023)

Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches

Wanchana Ponthongmak,
Ratchainant Thammasudjarit,
Gareth J McKay,
John Attia,
Nawanan Theera-Ampornpunt,
Ammarin Thakkinstian

Affiliations

Wanchana Ponthongmak: Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine, Ramathibodi Hospital, Bangkok, Thailand
Ratchainant Thammasudjarit: Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine, Ramathibodi Hospital, Bangkok, Thailand
Gareth J McKay: Centre for Public Health, Queen's University Belfast, Belfast, United Kingdom
John Attia: Centre for Clinical Epidemiology and Biostatistics, School of Medicine and Public Health, University of Newcastle, Newcastle, NSW, Australia
Nawanan Theera-Ampornpunt: Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine, Ramathibodi Hospital, Bangkok, Thailand; Corresponding author. 4th Floor, Sukho Place Building, Sukhothai Road. Dusit, Bangkok, 10300, Thailand.
Ammarin Thakkinstian: Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine, Ramathibodi Hospital, Bangkok, Thailand; Corresponding author. 4th Floor, Sukho Place Building, Sukhothai Road. Dusit, Bangkok, 10300, Thailand.

Journal volume & issue: Vol. 38
p. 101227

Abstract

Read online

Objectives: To develop an automated international classification of diseases (ICD) coding tool using natural language processing (NLP) and discharge summary texts from Thailand. Materials and methods: The development phase included 15,329 discharge summaries from Ramathibodi Hospital from January 2015 to December 2020. The external validation phase included Medical Information Mart for Intensive Care III (MIMIC-III) data. Three algorithms were developed: naïve Bayes with term frequency-inverse document frequency (NB-TF-IDF), convolutional neural network with neural word embedding (CNN-NWE), and CNN with PubMedBERT (CNN-PubMedBERT). In addition, two state-of-the-art models were also considered; convolutional attention for multi-label classification (CAML) and pretrained language models for automatic ICD coding (PLM-ICD). Results: The CNN-PubMedBERT model provided average micro- and macro-area under precision-recall curve (AUPRC) of 0.6605 and 0.5538, which outperformed CNN-NWE (0.6528 and 0.5564), NB-TF-IDF (0.4441 and 0.3562), and CAML (0.6257 and 0.4964), with corresponding differences of (0.0077 and −0.0026), (0.2164 and 0.1976), and (0.0348 and 0.0574), respectively. However, CNN-PubMedBERT performed less well relative to PLM-ICD, with corresponding AUPRCs of 0.7202 and 0.5865. The CNN-PubMedBERT model was externally validated using two subsets of MIMIC-III; MIMIC-ICD-10, and MIMIC-ICD-9 datasets, which contained 40,923 and 31,196 discharge summaries. The average micro-AUPRCs were 0.3745, 0.6878, and 0.6699, corresponding to directly predictive MIMIC-ICD-10, MIMIC-ICD-10 fine-tuning, and MIMIC-ICD-9 fine-tuning approaches; the average macro-AUPRCs for the corresponding models were 0.2819, 0.4219 and 0.5377, respectively. Discussion: CNN-PubMedBERT performed second-best to PLM-ICD, with considerable variation observed between average micro- and macro-AUPRC, especially for external validation, generally indicating good overall prediction but limited predictive value for small sample sizes. External validation in a US cohort demonstrated a higher level of model prediction performance. Conclusion: Both PLM-ICD and CNN-PubMedBERT models may provide useful tools for automated ICD-10 coding. Nevertheless, further evaluation and validation within Thai and Asian healthcare systems may prove more informative for clinical application.

Published in Informatics in Medicine Unlocked

ISSN: 2352-9148 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/informatics-in-medicine-unlocked/

About the journal

Abstract

Keywords