IEEE Access (Jan 2024)

Exploring Transformer for Face Mask Detection

  • Yonghua Mao,
  • Yuhang Lv,
  • Guangxin Zhang,
  • Xiaolin Gui

DOI
https://doi.org/10.1109/ACCESS.2024.3449802
Journal volume & issue
Vol. 12
pp. 118377 – 118388

Abstract

Read online

The COVID-19 pandemic has underscored the importance of face masks in curbing viral transmission, prompting governments worldwide to enforce stringent public health mandates requiring mask usage in public areas. Consequently, there is a growing focus on developing automated mask detection technologies to augment these measures and minimize viral spread. In this study, we explore the potential of the Swin Transformer architecture for accurately identifying face mask usage, aiming to surpass the current performance limitations of existing face mask detection models. We evaluate the performance of our proposed model and comparison models using comprehensive evaluation metrics, including accuracy, precision, recall, specificity, F1-score, Kappa coefficient, and MCC. Our experiments yield several notable findings. Firstly, MobileNetV2 demonstrates superior performance compared to the baseline CNN model across all seven evaluation metrics within the face mask datasets. Secondly, within the category of convolutional neural networks (CNNs), EfficientNetV2 outperforms MobileNetV2, a classic lightweight network, across all metrics. DenseNet exhibits better performance than ResNet-50 across all seven evaluation metrics. Most significantly, the Swin Transformer architecture emerges as the most effective model, surpassing not only MobileNetV2 but also EfficientNetV2. The empirical results confirm that our Swin Transformer achieves statistically significant improvements in accuracy, precision, recall, specificity, F1-score, Kappa coefficient, and MCC compared to the other models.

Keywords