IET Computer Vision (Jun 2023)

Multi‐directional feature refinement network for real‐time semantic segmentation in urban street scenes

  • Yan Zhou,
  • Xihong Zheng,
  • Yin Yang,
  • Jianxun Li,
  • Jinzhen Mu,
  • Richard Irampaye

DOI
https://doi.org/10.1049/cvi2.12178
Journal volume & issue
Vol. 17, no. 4
pp. 431 – 444

Abstract

Read online

Abstract Efficient and accurate semantic segmentation is crucial for autonomous driving scene parsing. Capturing detailed information and semantic information efficiently through two‐branch networks has been widely utilised in real‐time semantic segmentation. This study proposes a network named MRFNet based on two‐branch strategy to solve the problem of accuracy and speed of segmentation in urban scenes. Many real‐time networks do not comprehensively consider contextual information from sub‐regions in different directions and at different scales. To handle this problem, a Multi‐directional Feature Refinement Module (MFRM) which has three sub‐paths to capture information at different scales and directions is proposed. And MFRM reduces computation by using strip pooling and dilated convolution operations. In particular, the authors propose a Feature Cross‐guide Aggregation Module to aggregate detailed information and contextual information through the mutual guidance of detailed information and semantic information. This module guides the extraction of feature maps in a more precise direction. Experiments on Cityscapes and CamVid datasets demonstrate the effectiveness of our method by achieving a balance between accuracy and inference speed. Specially, on single 1080Ti GPU, our method yields 78.9% mean intersection over union (mIoU) and 77.4% mIoU at speed of 144.5 frames per second (FPS) and 120.8 FPS on Cityscapes and CamVid datasets respectively.

Keywords