Audio-Visual Perception System for a Humanoid Robotic Head

Raquel Viciana-Abad; Rebeca Marfil; Jose M. Perez-Lorenzo; Juan P. Bandera; Adrian Romero-Garces; Pedro Reche-Lopez

doi:10.3390/s140609522

Sensors (May 2014)

Audio-Visual Perception System for a Humanoid Robotic Head

Raquel Viciana-Abad,
Rebeca Marfil,
Jose M. Perez-Lorenzo,
Juan P. Bandera,
Adrian Romero-Garces,
Pedro Reche-Lopez

Affiliations

Raquel Viciana-Abad: University of Jaén, Multimedia and Multimodal Processing Group, Polytechnic School of Linares, University of Jaén Alfonso X El Sabio, 28, 23700, Linares, Spain
Rebeca Marfil: Dpto. Tecnología Electrónica, University of Málaga, Campus de Teatinos - 29071 Málaga, Spain
Jose M. Perez-Lorenzo: University of Jaén, Multimedia and Multimodal Processing Group, Polytechnic School of Linares, University of Jaén Alfonso X El Sabio, 28, 23700, Linares, Spain
Juan P. Bandera: Dpto. Tecnología Electrónica, University of Málaga, Campus de Teatinos - 29071 Málaga, Spain
Adrian Romero-Garces: Dpto. Tecnología Electrónica, University of Málaga, Campus de Teatinos - 29071 Málaga, Spain
Pedro Reche-Lopez: University of Jaén, Multimedia and Multimodal Processing Group, Polytechnic School of Linares, University of Jaén Alfonso X El Sabio, 28, 23700, Linares, Spain

DOI: https://doi.org/10.3390/s140609522
Journal volume & issue: Vol. 14, no. 6
pp. 9522 – 9545

Abstract

Read online

One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords