IEEE Access (Jan 2024)

Semantic Scene Completion Through Context Transformer and Recurrent Convolution

  • Wenlong Yang,
  • Hongfei Yu,
  • Yang Cao

DOI
https://doi.org/10.1109/ACCESS.2024.3401481
Journal volume & issue
Vol. 12
pp. 69700 – 69709

Abstract

Read online

The purpose of monocular semantic scene completion is to predict detailed 3D scene with semantic information using only one image. In order to improve the ability of extracting image features of the classical network and achieve better semantic scene completion effect, we propose a monocular semantic scene completion method based on context transformer and recurrent residual convolution. The context transformer module was added between the encoder and decoder of the image feature extraction network, which uses context information to guide the learning of the dynamic attention matrix and improve the visual representation ability. We also introduce a recurrent residual convolution module into the decoder to accumulate features at different time steps, thus helping to distinguish similar objects. Experimental results show that, on indoor dataset NYUv2 and outdoor traffic scene dataset Semantic KITTI, compared with the baseline method, the evaluation metrics mIoU of the semantic scene completion task is improved by 5% and 8% respectively.

Keywords