Speech recognition and intelligent translation under multimodal human–computer interaction system

Huang Danhua; Xiang Shuaiqiu

doi:10.1515/jisys-2023-0192

Journal of Intelligent Systems (Sep 2024)

Speech recognition and intelligent translation under multimodal human–computer interaction system

Huang Danhua,
Xiang Shuaiqiu

Affiliations

Huang Danhua: School of English Studies, Zhejiang Yuexiu University, Shaoxing, 312000, China
Xiang Shuaiqiu: School of Software, Shenzhen Institute of Information Technology, Shenzhen, 518172, China

DOI: https://doi.org/10.1515/jisys-2023-0192
Journal volume & issue: Vol. 33, no. 1
pp. 798 – 810

Abstract

Read online

The traditional translation robot is limited to the translation of single-mode text images and text videos, which has the problem of low translation accuracy. Therefore, speech recognition and intelligent translation in multimodal human–computer interaction (HCI) system are proposed. First, the network structure of speech recognition model in multi-channel HCI system is established, and the multi-head self-attention mechanism is constructed. Then, the artificial intelligence voice wake-up function is designed, and a multimodal machine translation model is constructed. On this basis, selective attention is added to obtain visual recognition of perceived text, and the decoder is used for multimodal gating fusion to realize the output of encoder translation results. Experimental results show that this method has high BLUE value and high translation accuracy.

Published in Journal of Intelligent Systems

ISSN: 0334-1860 (Print); 2191-026X (Online)
Publisher: De Gruyter
Country of publisher: Poland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.degruyter.com/view/journals/jisys/jisys-overview.xml

About the journal

Abstract

Keywords