TVENet: Transformer-Based Visual Exploration Network for Mobile Robot in Unseen Environment

Tianyao Zhang; Xiaoguang Hu; Jin Xiao; Guofeng Zhang

doi:10.1109/ACCESS.2022.3181989

IEEE Access (Jan 2022)

TVENet: Transformer-Based Visual Exploration Network for Mobile Robot in Unseen Environment

Tianyao Zhang,
Xiaoguang Hu,
Jin Xiao,
Guofeng Zhang

Affiliations

Tianyao Zhang: ORCiD; School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Xiaoguang Hu: School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Jin Xiao: ORCiD; School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Guofeng Zhang: School of Automation Science and Electrical Engineering, Beihang University, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2022.3181989
Journal volume & issue: Vol. 10
pp. 62056 – 62072

Abstract

Read online

This paper presents a Transformer-based Visual Exploration Network (TVENet) that capably serves as a solution for active perception problems, especially the visual exploration problem: How could a robot that is equipped with a camera explore an unknown 3D environment? The TVENet consists of a Mapper, a Global Policy and a Local Policy. The mapper is trained by supervised learning to take the visual observation as input and generate an occupancy grid map for the explored environment. The Global Policy and the Local Policy are trained by reinforcement learning in order to make navigation decision. Most state-of-the-art methods in visual exploration domain use ResNet as feature extractor, and few of them pay attention to the extraction capability of the extractor. Therefore, this paper focuses on enhancing the extraction capability, and proposes a Transformer-based Feature Pyramid Module (TFPM). Moreover, two tricks for training process are introduced to improve the performance (M.F. and Aux.) Our experiments in photo-realistic simulated environment (Habitat) demonstrate the higher-accuracy mapping of TVENet. Experimental results prove that the TFPM and tricks have positive impacts on the mapping accuracy of the visual exploration and increase it by 5.31% compared with the state-of-the-art. Most importantly, the TVENet is deployed on a real robot (NVIDIA Jetbot) to prove the feasibility of Embodied AI approaches. To the authors’ knowledge, this paper is the first one that proves the viability of the Embodied AI style approach for visual exploration tasks and deploys the pre-trained model on the NVIDIA Jetson robot.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords