Improving the Generalizability and Performance of an Ultrasound Deep Learning Model Using Limited Multicenter Data for Lung Sliding Artifact Identification
Derek Wu,
Delaney Smith,
Blake VanBerlo,
Amir Roshankar,
Hoseok Lee,
Brian Li,
Faraz Ali,
Marwan Rahman,
John Basmaji,
Jared Tschirhart,
Alex Ford,
Bennett VanBerlo,
Ashritha Durvasula,
Claire Vannelli,
Chintan Dave,
Jason Deglint,
Jordan Ho,
Rushil Chaudhary,
Hans Clausdorff,
Ross Prager,
Scott Millington,
Samveg Shah,
Brian Buchanan,
Robert Arntfield
Affiliations
Derek Wu
Department of Medicine, Western University, London, ON N6A 5C1, Canada
Delaney Smith
Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Blake VanBerlo
Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Amir Roshankar
Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Hoseok Lee
Faculty of Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Brian Li
Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Faraz Ali
Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Marwan Rahman
Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
John Basmaji
Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada
Jared Tschirhart
Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
Alex Ford
Independent Researcher, London, ON N6A 1L8, Canada
Bennett VanBerlo
Faculty of Engineering, Western University, London, ON N6A 5C1, Canada
Ashritha Durvasula
Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
Claire Vannelli
Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
Chintan Dave
Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada
Jason Deglint
Faculty of Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Jordan Ho
Department of Family Medicine, Western University, London, ON N6A 5C1, Canada
Rushil Chaudhary
Department of Medicine, Western University, London, ON N6A 5C1, Canada
Hans Clausdorff
Departamento de Medicina de Urgencia, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile
Ross Prager
Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada
Scott Millington
Department of Critical Care Medicine, University of Ottawa, Ottawa, ON K1N 6N5, Canada
Samveg Shah
Department of Medicine, University of Alberta, Edmonton, AB T6G 2R3, Canada
Brian Buchanan
Department of Critical Care, University of Alberta, Edmonton, AB T6G 2R3, Canada
Robert Arntfield
Division of Critical Care Medicine, Western University, London, ON N6A 5C1, Canada
Deep learning (DL) models for medical image classification frequently struggle to generalize to data from outside institutions. Additional clinical data are also rarely collected to comprehensively assess and understand model performance amongst subgroups. Following the development of a single-center model to identify the lung sliding artifact on lung ultrasound (LUS), we pursued a validation strategy using external LUS data. As annotated LUS data are relatively scarce—compared to other medical imaging data—we adopted a novel technique to optimize the use of limited external data to improve model generalizability. Externally acquired LUS data from three tertiary care centers, totaling 641 clips from 238 patients, were used to assess the baseline generalizability of our lung sliding model. We then employed our novel Threshold-Aware Accumulative Fine-Tuning (TAAFT) method to fine-tune the baseline model and determine the minimum amount of data required to achieve predefined performance goals. A subgroup analysis was also performed and Grad-CAM++ explanations were examined. The final model was fine-tuned on one-third of the external dataset to achieve 0.917 sensitivity, 0.817 specificity, and 0.920 area under the receiver operator characteristic curve (AUC) on the external validation dataset, exceeding our predefined performance goals. Subgroup analyses identified LUS characteristics that most greatly challenged the model’s performance. Grad-CAM++ saliency maps highlighted clinically relevant regions on M-mode images. We report a multicenter study that exploits limited available external data to improve the generalizability and performance of our lung sliding model while identifying poorly performing subgroups to inform future iterative improvements. This approach may contribute to efficiencies for DL researchers working with smaller quantities of external validation data.