Effective Acoustic Model-Based Beamforming Training for Static and Dynamic Hri Applications

Alejandro Luzanto; Nicolás Bohmer; Rodrigo Mahu; Eduardo Alvarado; Richard M. Stern; Néstor Becerra Yoma

doi:10.3390/s24206644

Sensors (Oct 2024)

Effective Acoustic Model-Based Beamforming Training for Static and Dynamic Hri Applications

Alejandro Luzanto,
Nicolás Bohmer,
Rodrigo Mahu,
Eduardo Alvarado,
Richard M. Stern,
Néstor Becerra Yoma

Affiliations

Alejandro Luzanto: Speech Processing and Transmission Laboratory, Electrical Engineering Department, University of Chile, Santiago 8370451, Chile
Nicolás Bohmer: Speech Processing and Transmission Laboratory, Electrical Engineering Department, University of Chile, Santiago 8370451, Chile
Rodrigo Mahu: Speech Processing and Transmission Laboratory, Electrical Engineering Department, University of Chile, Santiago 8370451, Chile
Eduardo Alvarado: Speech Processing and Transmission Laboratory, Electrical Engineering Department, University of Chile, Santiago 8370451, Chile
Richard M. Stern: Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
Néstor Becerra Yoma: Speech Processing and Transmission Laboratory, Electrical Engineering Department, University of Chile, Santiago 8370451, Chile

DOI: https://doi.org/10.3390/s24206644
Journal volume & issue: Vol. 24, no. 20
p. 6644

Abstract

Read online

Human–robot collaboration will play an important role in the fourth industrial revolution in applications related to hostile environments, mining, industry, forestry, education, natural disaster and defense. Effective collaboration requires robots to understand human intentions and tasks, which involves advanced user profiling. Voice-based communication, rich in complex information, is key to this. Beamforming, a technology that enhances speech signals, can help robots extract semantic, emotional, or health-related information from speech. This paper describes the implementation of a system that provides substantially improved signal-to-noise ratio (SNR) and speech recognition accuracy to a moving robotic platform for use in human–robot interaction (HRI) applications in static and dynamic contexts. This study focuses on training deep learning-based beamformers using acoustic model-based multi-style training with measured room impulse responses (RIRs). The results show that this approach outperforms training with simulated RIRs or matched measured RIRs, especially in dynamic conditions involving robot motion. The findings suggest that training with a broad range of measured RIRs is sufficient for effective HRI in various environments, making additional data recording or augmentation unnecessary. This research demonstrates that deep learning-based beamforming can significantly improve HRI performance, particularly in challenging acoustic environments, surpassing traditional beamforming methods.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords