IEEE Photonics Journal (Jan 2024)

MCIR-YOLO: White Medication Pill Classification Using Multi-Band Infrared Images

  • Mohan Wang,
  • Yang Jiang,
  • Baohui Xu,
  • Mengqiang Huang,
  • Xu Xue,
  • Xu Wu,
  • Wenjian Kuang,
  • Xiang Liu,
  • Harm Tolner

DOI
https://doi.org/10.1109/JPHOT.2024.3426929
Journal volume & issue
Vol. 16, no. 4
pp. 1 – 13

Abstract

Read online

The identification and categorization of pills constitute critical tasks within a contemporary hospital, particularly for avoiding medication errors. Conventional approaches to visual recognition and classification predominantly rely on visible light imagery, proving inadequate for discerning white pills with similar visual characteristics. However, white pills exhibit distinctive infrared properties across various spectral bands. Building upon these observations, this paper introduces the MCIR-YOLO algorithm, a multi-band infrared image object detection system, which enhances the YOLOv5s model through multimodal fusion techniques. This study presents a novel dataset comprising IR images of white round pills captured across six channels, with peak wavelengths ranging from approximately 1400 nm to 1650 nm. Furthermore, a multimodal fusion strategy is proposed, facilitating multi-level feature integration across the six IR channels. This fusion technique exploits the scale features inherent to each IR modality, thereby enabling comprehensive information fusion across multiple modalities. Additionally, the model incorporates an auxiliary detection branch, independent of the backbone, which utilizes fused feature information to calculate a distinct loss, effectively mitigating overall loss. Attention mechanism modules are integrated after two distinct fusion points to enhance feature precision. Leveraging mean and scaling of IR features, these attention mechanisms significantly boost detection accuracy. Experimental results demonstrate that the improved model outperforms the baseline YOLOv5s model, particularly evident in a self-constructed dataset of white round pill IR images, where mAP0.5 increased by 5.47% and 7.96% for single-channel (peak at 1650 nm) and six-channel configurations, respectively. Notably, the utilization of the MCIR-YOLO model for six-channel recognition yields a substantial advantage of 12.05% over the best-performing single-channel IR image recognition.

Keywords