Multimodal fusion-powered English speaking robot

Ruiying Pan

doi:10.3389/fnbot.2024.1478181

Frontiers in Neurorobotics (Nov 2024)

Multimodal fusion-powered English speaking robot

Ruiying Pan

Affiliations

Ruiying Pan

DOI: https://doi.org/10.3389/fnbot.2024.1478181
Journal volume & issue: Vol. 18

Abstract

Read online

IntroductionSpeech recognition and multimodal learning are two critical areas in machine learning. Current multimodal speech recognition systems often encounter challenges such as high computational demands and model complexity.MethodsTo overcome these issues, we propose a novel framework-EnglishAL-Net, a Multimodal Fusion-powered English Speaking Robot. This framework leverages the ALBEF model, optimizing it for real-time speech and multimodal interaction, and incorporates a newly designed text and image editor to fuse visual and textual information. The robot processes dynamic spoken input through the integration of Neural Machine Translation (NMT), enhancing its ability to understand and respond to spoken language.Results and discussionIn the experimental section, we constructed a dataset containing various scenarios and oral instructions for testing. The results show that compared to traditional unimodal processing methods, our model significantly improves both language understanding accuracy and response time. This research not only enhances the performance of multimodal interaction in robots but also opens up new possibilities for applications of robotic technology in education, rescue, customer service, and other fields, holding significant theoretical and practical value.

Published in Frontiers in Neurorobotics

ISSN: 1662-5218 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: https://www.frontiersin.org/journals/neurorobotics/

About the journal

Abstract

Keywords