Multimodal Representation Learning for Place Recognition Using Deep Hebbian Predictive Coding

Martin J. Pearson; Shirin Dora; Shirin Dora; Oliver Struckmeier; Thomas C. Knowles; Ben Mitchinson; Kshitij Tiwari; Ville Kyrki; Sander Bohte; Sander Bohte; Cyriel M.A. Pennartz

doi:10.3389/frobt.2021.732023

Frontiers in Robotics and AI (Dec 2021)

Multimodal Representation Learning for Place Recognition Using Deep Hebbian Predictive Coding

Martin J. Pearson,
Shirin Dora,
Shirin Dora,
Oliver Struckmeier,
Thomas C. Knowles,
Ben Mitchinson,
Kshitij Tiwari,
Ville Kyrki,
Sander Bohte,
Sander Bohte,
Cyriel M.A. Pennartz

Affiliations

Martin J. Pearson: Bristol Robotics Laboratory, University of The West England Bristol, Bristol, United Kingdom
Shirin Dora: Department of Computer Science, Loughborough University, Loughborough, United Kingdom
Shirin Dora: Center for Mathematics and Informatics, Amsterdam, Netherlands
Oliver Struckmeier: Intelligent Robotics Group, Aalto University, Helsinki, Finland
Thomas C. Knowles: Bristol Robotics Laboratory, University of The West England Bristol, Bristol, United Kingdom
Ben Mitchinson: Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
Kshitij Tiwari: Intelligent Robotics Group, Aalto University, Helsinki, Finland
Ville Kyrki: Intelligent Robotics Group, Aalto University, Helsinki, Finland
Sander Bohte: Center for Mathematics and Informatics, Amsterdam, Netherlands
Sander Bohte: Department of Cognitive and Systems Neuroscience, University of Amsterdam, Amsterdam, Netherlands
Cyriel M.A. Pennartz: Department of Cognitive and Systems Neuroscience, University of Amsterdam, Amsterdam, Netherlands

DOI: https://doi.org/10.3389/frobt.2021.732023
Journal volume & issue: Vol. 8

Abstract

Read online

Recognising familiar places is a competence required in many engineering applications that interact with the real world such as robot navigation. Combining information from different sensory sources promotes robustness and accuracy of place recognition. However, mismatch in data registration, dimensionality, and timing between modalities remain challenging problems in multisensory place recognition. Spurious data generated by sensor drop-out in multisensory environments is particularly problematic and often resolved through adhoc and brittle solutions. An effective approach to these problems is demonstrated by animals as they gracefully move through the world. Therefore, we take a neuro-ethological approach by adopting self-supervised representation learning based on a neuroscientific model of visual cortex known as predictive coding. We demonstrate how this parsimonious network algorithm which is trained using a local learning rule can be extended to combine visual and tactile sensory cues from a biomimetic robot as it naturally explores a visually aliased environment. The place recognition performance obtained using joint latent representations generated by the network is significantly better than contemporary representation learning techniques. Further, we see evidence of improved robustness at place recognition in face of unimodal sensor drop-out. The proposed multimodal deep predictive coding algorithm presented is also linearly extensible to accommodate more than two sensory modalities, thereby providing an intriguing example of the value of neuro-biologically plausible representation learning for multimodal navigation.

Published in Frontiers in Robotics and AI

ISSN: 2296-9144 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Technology: Mechanical engineering and machinery; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/robotics-and-ai

About the journal

Abstract

Keywords