CAAI Transactions on Intelligence Technology (Dec 2023)

Vision‐audio fusion SLAM in dynamic environments

  • Tianwei Zhang,
  • Huayan Zhang,
  • Xiaofei Li

DOI
https://doi.org/10.1049/cit2.12206
Journal volume & issue
Vol. 8, no. 4
pp. 1364 – 1373

Abstract

Read online

Abstract Moving humans, agents, and subjects bring many challenges to robot self‐localisation and environment perception. To adapt to dynamic environments, SLAM researchers typically apply several deep learning image segmentation models to eliminate these moving obstacles. However, these moving obstacle segmentation methods cost too much computation resource for the onboard processing of mobile robots. In the current industrial environment, mobile robot collaboration scenario, the noise of mobile robots could be easily found by on‐board audio‐sensing processors and the direction of sound sources can be effectively acquired by sound source estimation algorithms, but the distance estimation of sound sources is difficult. However, in the field of visual perception, the 3D structure information of the scene is relatively easy to obtain, but the recognition and segmentation of moving objects is more difficult. To address these problems, a novel vision‐audio fusion method that combines sound source localisation methods with a visual SLAM scheme is proposed, thereby eliminating the effect of dynamic obstacles on multi‐agent systems. Several heterogeneous robots experiments in different dynamic scenes indicate very stable self‐localisation and environment reconstruction performance of our method.

Keywords