IEEE Access (Jan 2024)
Dual-Masked Autoencoders: Application to Multi-Labeled Pediatric Thoracic Diseases
Abstract
Pediatric thoracic diseases present significant health risks to children. While chest X-rays are commonly used for diagnosing thoracic diseases, interpreting pediatric images comes with unique challenges such as anatomical variations, developmental differences, and potential artifacts. Deep learning offers promise in addressing these challenges, yet its effectiveness is hindered by the limited availability of pediatric chest X-ray data. To overcome this limitation, we introduce the dual-masked autoencoders (dual-MAE) algorithm, consisting of online and target networks with encoder and decoder modules. These networks are optimized by minimizing three losses: between the reconstructed image of the online network and the target network, between the input image and the reconstructed image of the online network, and between the input image and the reconstructed image of the target network. To learn efficiently from pediatric chest X-rays, we employ a two-step training strategy: pretraining the dual-MAE model on adult chest X-rays, then fine-tuning it on pediatric X-rays for diagnosing multi-labeled pediatric thoracic diseases. The proposed model exhibited superior performance with the highest mean AUC score (0.752), surpassing the ResNet-34 (0.669) and ViT-S (0.645) trained from scratch. Additionally, the dual-MAE model outperformed the ResNet-34 (0.697) and ViT-S (0.638), both pretrained on the ImageNet dataset and then fine-tuned on pediatric chest X-rays. Despite being pretrained on a significantly smaller number of X-rays compared to the ImageNet dataset, our model demonstrated better performance. Furthermore, it outperformed the ResNet-34 (0.712), ViT-S (0.673), and vanilla MAE method (0.735), all pretrained on adult chest X-rays and fine-tuned on pediatric chest X-rays. Even with only 50% of labeled pediatric chest X-ray images, dual-MAE demonstrated comparable performance to that of the vanilla MAE method and outperformed ResNet-34 and ViT-S fine-tuned with 100% labeled pediatric chest X-ray images.
Keywords