Informatics in Medicine Unlocked (Jan 2023)
Development and validation of deep learning and BERT models for classification of lung cancer radiology reports
Abstract
Purpose: Manual cohort building from radiology reports can be tedious. Natural Language Processing (NLP) can be used for automated cohort building. In this study, we have developed and validated an NLP approach based on deep learning (DL) to select lung cancer reports from a thoracic disease management group cohort. Materials and methods: 4064 radiology reports (CT and PET/CT) of a thoracic disease management group reported between 2014 and 2016 were used. These reports were anonymised, cleaned, text normalized and split into a training, testing, and validation set. External validation was performed on radiology reports from the MIMIC-III clinical database. We used three DL models, namely, Bi-LSTM_simple, Bi-LSTM_dropout, and Pre-trained _BERT model to predict if a report concerned lung cancer. We studied the effect of minority oversampling on all models. Results: Without oversampling, the F1 scores at 95% CI for Bi-LSTM_simple, Bi-LSTM_dropout and BERT were 0.89, 0.90, and 0.86; with oversampling, the F1 scores were 0.94, 0.94, and 0.9, on internal validation. On external validation the F1-scores of Bi-LSTM_simple, Bi-LSTM_dropout and BERT models were 0.63, 0.77 and 0.80 without oversampling and 0.72, 0.78 and 0.77 with oversampling. Conclusion: Pre-trained BERT model and Bi-LSTM_dropout models to predict a lung cancer report showed consistent performance on internal and external validation with the BERT model exhibiting superior performance. The overall F1 score decreased on external validation for both Bi-LSTM models with the Bi-LSTM_simple model showing a more significant drop. All models showed some improvement on minority oversampling.