A duplex transform heterogeneous feature fusion network for road segmentation

Zhiyang Guo; Xing Hu; Jiejia Wang; XiaoYu Miao; MengTeng Sun; HuaiWei Wang; XueYing Ma

doi:10.1038/s41598-024-68255-4

Scientific Reports (Jul 2024)

A duplex transform heterogeneous feature fusion network for road segmentation

Zhiyang Guo,
Xing Hu,
Jiejia Wang,
XiaoYu Miao,
MengTeng Sun,
HuaiWei Wang,
XueYing Ma

Affiliations

Zhiyang Guo: School of Traffic Engineering, Jiangsu Shipping College
Xing Hu: School of Optical-Electrical and Computer Engineering, University of Shanghai for Science & Technology
Jiejia Wang: School of Traffic Engineering, Jiangsu Shipping College
XiaoYu Miao: School of Traffic Engineering, Jiangsu Shipping College
MengTeng Sun: School of Traffic Engineering, Jiangsu Shipping College
HuaiWei Wang: School of Traffic Engineering, Jiangsu Shipping College
XueYing Ma: School of Traffic Engineering, Jiangsu Shipping College

DOI: https://doi.org/10.1038/s41598-024-68255-4
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Detecting roads in automatic driving environments poses a challenge due to issues such as boundary fuzziness, occlusion, and glare from light. We believe that two factors are instrumental in addressing these challenges and enhancing detection performance: global context dependency and effective feature representation that prioritizes important feature channels. To tackle these issues, we introduce DTRoadseg, a novel duplex Transformer-based heterogeneous feature fusion network designed for road segmentation. DTRoadseg leverages a duplex encoder architecture to extract heterogeneous features from both RGB images and point-cloud depth images. Subsequently, we introduce a multi-source Heterogeneous Feature Reinforcement Block (HFRB) for fusion of the encoded features, comprising a Heterogeneous Feature Fusion Module (HFFM) and a Reinforcement Fusion Module (RFM). The HFFM leverages the self-attention mechanisms of Transformers to achieve effective fusion through token interactions, while the RFM focuses on emphasizing informative features while downplaying less important ones, thereby reinforcing feature fusion. Finally, a Transformer decoder is utilized to produce the final semantic prediction. Furthermore, we employ a boundary loss function to optimize the segmentation structure area, reduce false detection areas, and improve model accuracy. Extensive experiments are carried out on the KITTI road dataset. The results demonstrate that, compared with state-of-the-art methods, DTRoadseg exhibits superior performance, achieving an average accuracy of 97.01%, a Recall of 96.35%, and runs at a speed of 0.09 s per picture.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords