Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children

Yaroslav Getman; Nhan Phan; Ragheb Al-Ghezi; Ekaterina Voskoboinik; Mittul Singh; Tamas Grosz; Mikko Kurimo; Giampiero Salvi; Torbjorn Svendsen; Sofia Strombergsson; Anna Smolander; Sari Ylinen

doi:10.1109/ACCESS.2023.3304274

IEEE Access (Jan 2023)

Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children

Yaroslav Getman,
Nhan Phan,
Ragheb Al-Ghezi,
Ekaterina Voskoboinik,
Mittul Singh,
Tamas Grosz,
Mikko Kurimo,
Giampiero Salvi,
Torbjorn Svendsen,
Sofia Strombergsson,
Anna Smolander,
Sari Ylinen

Affiliations

Yaroslav Getman: ORCiD; Department of Information and Communications Engineering, Aalto University, Espoo, Finland
Nhan Phan: ORCiD; Department of Information and Communications Engineering, Aalto University, Espoo, Finland
Ragheb Al-Ghezi: Department of Information and Communications Engineering, Aalto University, Espoo, Finland
Ekaterina Voskoboinik: Department of Information and Communications Engineering, Aalto University, Espoo, Finland
Mittul Singh: Department of Information and Communications Engineering, Aalto University, Espoo, Finland
Tamas Grosz: ORCiD; Department of Information and Communications Engineering, Aalto University, Espoo, Finland
Mikko Kurimo: Department of Information and Communications Engineering, Aalto University, Espoo, Finland
Giampiero Salvi: ORCiD; Department of Signal Processing, Norwegian University of Science and Technology, Trondheim, Norway
Torbjorn Svendsen: ORCiD; Department of Signal Processing, Norwegian University of Science and Technology, Trondheim, Norway
Sofia Strombergsson: ORCiD; Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Huddinge, Sweden
Anna Smolander: Logopedics, Welfare Sciences, Faculty of Social Sciences, Tampere University, Tampere, Finland
Sari Ylinen: ORCiD; Logopedics, Welfare Sciences, Faculty of Social Sciences, Tampere University, Tampere, Finland

DOI: https://doi.org/10.1109/ACCESS.2023.3304274
Journal volume & issue: Vol. 11
pp. 86025 – 86037

Abstract

Read online

Computer-assisted Language Learning (CALL) is a rapidly developing area accelerated by advancements in the field of AI. A well-designed and reliable CALL system allows students to practice language skills, like pronunciation, any time outside of the classroom. Furthermore, gamification via mobile applications has shown encouraging results on learning outcomes and motivates young users to practice more and perceive language learning as a positive experience. In this work, we adapt the latest speech recognition technology to be a part of an online pronunciation training system for small children. As part of our gamified mobile application, our models will assess the pronunciation quality of young Swedish children diagnosed with Speech Sound Disorder, and participating in speech therapy. Additionally, the models provide feedback to young non-native children learning to pronounce Swedish and Finnish words. Our experiments revealed that these new models fit into an online game as they function as speech recognizers and pronunciation evaluators simultaneously. To make our systems more trustworthy and explainable, we investigated whether the combination of modern input attribution algorithms and time-aligned transcripts can explain the decisions made by the models, give us insights into how the models work and provide a tool to develop more reliable solutions.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords