AIP Advances (Sep 2024)

Semantic segmentation algorithm for pantograph based on multi-scale strip pooling attention mechanism and application research

  • Renjie Shi,
  • Liming Li,
  • Shubin Zheng,
  • Yizhou Mao,
  • Xiaoxue An

DOI
https://doi.org/10.1063/5.0230117
Journal volume & issue
Vol. 14, no. 9
pp. 095310 – 095310-12

Abstract

Read online

Detecting pantographs remains a challenging task due to complex scenes, variable weather conditions, and noise interference. Existing pantograph detection methods struggle to effectively segment the complete shape of the pantograph from intricate backgrounds and adverse weather, and they often exhibit inadequate real-time performance. To address these challenges, we propose a novel pantograph segmentation method that leverages a deep learning multi-scale strip pooling attention mechanism. Our approach utilizes the PidNet semantic segmentation network as the baseline architecture, while we introduce a newly designed multi-scale strip pooling attention mechanism specifically for the detail extraction branch. The multi-scale strip convolution branch effectively extracts the pantograph pixel-level detail features, while the pooling branch effectively extracts the macroscopic features of the pantograph. The unique linear interpolation method effectively mitigates the influence of weather, enhancing segmentation accuracy while maintaining a lightweight structure. In the context aggregation branch, a multi-scale context aggregation module utilizing gated convolution has been developed to replace the original network’s module, which possesses strong pantograph positioning capabilities. In comparison to existing pantograph detection methods, our model demonstrates the ability to accurately segment the pantograph with a clearly defined shape, effectively filter out extraneous background noise, and exhibit high robustness to variations in illumination and weather conditions. In addition, a rich pantograph dataset was created, including various scenarios and weather conditions, which also enhanced the robustness of the model. When the IOU and accuracy are 92.91% and 96.04%, respectively, the inference speed can still exceed 30 FPS on a single 2080Ti GPU.