BioMedInformatics (Aug 2024)
Diffusion-Based Image Synthesis or Traditional Augmentation for Enriching Musculoskeletal Ultrasound Datasets
Abstract
Background: Machine learning models can provide quick and reliable assessments in place of medical practitioners. With over 50 million adults in the United States suffering from osteoarthritis, there is a need for models capable of interpreting musculoskeletal ultrasound images. However, machine learning requires lots of data, which poses significant challenges in medical imaging. Therefore, we explore two strategies for enriching a musculoskeletal ultrasound dataset independent of these limitations: traditional augmentation and diffusion-based image synthesis. Methods: First, we generate augmented and synthetic images to enrich our dataset. Then, we compare the images qualitatively and quantitatively, and evaluate their effectiveness in training a deep learning model for detecting thickened synovium and knee joint recess distension. Results: Our results suggest that synthetic images exhibit some anatomical fidelity, diversity, and help a model learn representations consistent with human opinion. In contrast, augmented images may impede model generalizability. Finally, a model trained on synthetically enriched data outperforms models trained on un-enriched and augmented datasets. Conclusions: We demonstrate that diffusion-based image synthesis is preferable to traditional augmentation. Our study underscores the importance of leveraging dataset enrichment strategies to address data scarcity in medical imaging and paves the way for the development of more advanced diagnostic tools.
Keywords