Scientific Reports (Aug 2024)

Comparative analysis of vision transformers and convolutional neural networks in osteoporosis detection from X-ray images

  • Ali Sarmadi,
  • Zahra Sadat Razavi,
  • Andre J. van Wijnen,
  • Madjid Soltani

DOI
https://doi.org/10.1038/s41598-024-69119-7
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Within the scope of this investigation, we carried out experiments to investigate the potential of the Vision Transformer (ViT) in the field of medical image analysis. The diagnosis of osteoporosis through inspection of X-ray radio-images is a substantial classification problem that we were able to address with the assistance of Vision Transformer models. In order to provide a basis for comparison, we conducted a parallel analysis in which we sought to solve the same problem by employing traditional convolutional neural networks (CNNs), which are well-known and commonly used techniques for the solution of image categorization issues. The findings of our research led us to conclude that ViT is capable of achieving superior outcomes compared to CNN. Furthermore, provided that methods have access to a sufficient quantity of training data, the probability increases that both methods arrive at more appropriate solutions to critical issues.