Transfer Learning for Humanoid Robot Appearance-Based Localization in a Visual Map

Emmanuel Ovalle-Magallanes; Noe G. Aldana-Murillo; Juan Gabriel Avina-Cervantes; Jose Ruiz-Pinales; Jonathan Cepeda-Negrete; Sergio Ledesma

doi:10.1109/ACCESS.2020.3048936

IEEE Access (Jan 2021)

Transfer Learning for Humanoid Robot Appearance-Based Localization in a Visual Map

Emmanuel Ovalle-Magallanes,
Noe G. Aldana-Murillo,
Juan Gabriel Avina-Cervantes,
Jose Ruiz-Pinales,
Jonathan Cepeda-Negrete,
Sergio Ledesma

Affiliations

Emmanuel Ovalle-Magallanes: ORCiD; Telematics (CA) and Digital Signal Processing (CA) Groups, Engineering Division (DICIS), Campus Irapuato-Salamanca, University of Guanajuato, Salamanca, Mexico
Noe G. Aldana-Murillo: ORCiD; Computer Science Department, Center for Research in Mathematics (CIMAT), Guanajuato, Mexico
Juan Gabriel Avina-Cervantes: ORCiD; Telematics (CA) and Digital Signal Processing (CA) Groups, Engineering Division (DICIS), Campus Irapuato-Salamanca, University of Guanajuato, Salamanca, Mexico
Jose Ruiz-Pinales: Telematics (CA) and Digital Signal Processing (CA) Groups, Engineering Division (DICIS), Campus Irapuato-Salamanca, University of Guanajuato, Salamanca, Mexico
Jonathan Cepeda-Negrete: ORCiD; Department of Agricultural Engineering, Division of Life Sciences, Campus Irapuato-Salamanca, University of Guanajuato, Irapuato, Mexico
Sergio Ledesma: ORCiD; Telematics (CA) and Digital Signal Processing (CA) Groups, Engineering Division (DICIS), Campus Irapuato-Salamanca, University of Guanajuato, Salamanca, Mexico

DOI: https://doi.org/10.1109/ACCESS.2020.3048936
Journal volume & issue: Vol. 9
pp. 6868 – 6877

Abstract

Read online

Autonomous robot visual navigation is a fundamental locomotion task based on extracting relevant features from images taken from the surrounded environment to control an independent displacement. In the navigation, the use of a known visual map helps obtain an accurate localization, but in the absence of this map, a guided or free exploration pathway must be executed to obtain the images sequence representing the visual map. This paper presents an appearance-based localization method based on a visual map and an end-to-end Convolutional Neural Network (CNN). The CNN is initialized via transfer learning (trained using the ImageNet dataset), evaluating four state-of-the-art CNN architectures: VGG16, ResNet50, InceptionV3, and Xception. A typical pipeline for transfer learning includes changing the last layer to adapt the number of neurons according to the number of custom classes. In this work, the dense layers after the convolutional and pooling layers were substituted by a Global Average Pooling (GAP) layer, which is parameter-free. Additionally, an L2 -norm constraint was added to the GAP layer feature descriptors, restricting the features from lying on a fixed radius hypersphere. These different pre-trained configurations were analyzed and compared using two visual maps found in the CIMAT-NAO datasets consisting of 187 and 94 images, respectively. For evaluating the localization tasks, a set of 278 and 94 images were available for each visual map, respectively. The numerical results proved that by integrating the L2-norm constraint in the training pipeline, the appearance-based localization performance is boosted. Specifically, the pre-trained VGG16 and Xception networks achieved the best localization results, reaching a top-3 accuracy of 90.70% and 93.62% for each dataset, respectively, overcoming the referenced approaches based on hand-crafted feature extractors.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords