Enhancing human computer interaction with coot optimization and deep learning for multi language identification

Elvir Akhmetshin; Galina Meshkova; Maria Mikhailova; Rustem Shichiyakh; Gyanendra Prasad Joshi; Woong Cho

doi:10.1038/s41598-024-74327-2

Scientific Reports (Oct 2024)

Enhancing human computer interaction with coot optimization and deep learning for multi language identification

Elvir Akhmetshin,
Galina Meshkova,
Maria Mikhailova,
Rustem Shichiyakh,
Gyanendra Prasad Joshi,
Woong Cho

Affiliations

Elvir Akhmetshin: Candidate of Economic Sciences, Department of Economics and Management, Kazan Federal University, Elabuga Institute of KFU
Galina Meshkova: Candidate of Economic Sciences, Department of Innovative Entrepreneurship, Bauman Moscow State Technical University
Maria Mikhailova: Candidate of Medical Sciences, Department of Prosthetic Dentistry, Sechenov First Moscow State Medical University
Rustem Shichiyakh: Candidate of Economic Sciences, Department of Management, Kuban State Agrarian University named after I.T. Trubilin
Gyanendra Prasad Joshi: Department of AI Software, Kangwon National University
Woong Cho: Department of Electronics, Information and Communication Engineering, Kangwon National University

DOI: https://doi.org/10.1038/s41598-024-74327-2
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 20

Abstract

Read online

Abstract Human-Computer Interaction (HCI) is a multidisciplinary field focused on designing and utilizing computer technology, underlining the interaction interface between computers and humans. HCI aims to generate systems that allow consumers to relate to computers effectively, efficiently, and pleasantly. Multiple Spoken Language Identification (SLI) for HCI (MSLI for HCI) denotes the ability of a computer system to recognize and distinguish various spoken languages to enable more complete and handy interactions among consumers and technology. SLI utilizing deep learning (DL) involves using artificial neural networks (ANNs), a subset of DL models, to automatically detect and recognize the language spoken in an audio signal. DL techniques, particularly neural networks (NNs), have succeeded in various pattern detection tasks, including speech and language processing. This paper develops a novel Coot Optimizer Algorithm with a DL-Driven Multiple SLI and Detection (COADL-MSLID) technique for HCI applications. The COADL-MSLID approach aims to detect multiple spoken languages from the input audio regardless of gender, speaking style, and age. In the COADL-MSLID technique, the audio files are transformed into spectrogram images as a primary step. Besides, the COADL-MSLID technique employs the SqueezeNet model to produce feature vectors, and the COA is applied to the hyperparameter range of the SqueezeNet method. The COADL-MSLID technique exploits the SLID process’s convolutional autoencoder (CAE) model. To underline the importance of the COADL-MSLID technique, a series of experiments were conducted on the benchmark dataset. The experimentation validation of the COADL-MSLID technique exhibits a greater accuracy result of 98.33% over other techniques.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords