Towards Deep Object Detection Techniques for Phoneme Recognition

Mohammed Algabri; Hassan Mathkour; Mohamed Abdelkader Bencherif; Mansour Alsulaiman; Mohamed Amine Mekhtiche

doi:10.1109/ACCESS.2020.2980452

IEEE Access (Jan 2020)

Towards Deep Object Detection Techniques for Phoneme Recognition

Mohammed Algabri,
Hassan Mathkour,
Mohamed Abdelkader Bencherif,
Mansour Alsulaiman,
Mohamed Amine Mekhtiche

Affiliations

Mohammed Algabri: ORCiD; Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Hassan Mathkour: Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Mohamed Abdelkader Bencherif: Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Mansour Alsulaiman: Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Mohamed Amine Mekhtiche: ORCiD; Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2020.2980452
Journal volume & issue: Vol. 8
pp. 54663 – 54680

Abstract

Read online

The use of cutting edge object detection techniques to build an accurate phoneme sequence recognition system for English and Arabic languages is investigated in this study. Recently, numerous techniques have been proposed for object detection in daily life applications using deep learning. In this paper, we propose the use of object detection techniques in speech processing tasks. We selected two state-of-the-art object detectors, namely YOLO and CenterNet, based on a trade-off between detection accuracy and speed. We tackled the problem of phoneme sequence recognition using three systems: the domain transfer learning system (DTS) from image to speech, intra-language transfer leaning system (IaTS) between speech corpora within the same language (English to English), and inter-language transfer learning system (IeTS) between speech corpora from dissimilar languages (English to Arabic). For English phoneme recognition, the Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus is used to evaluate the performance of the proposed systems. Our IaTS based on the CenterNet detector achieves the best results using the test core set of TIMIT with 15.89% phone error rate (PER). For Arabic phoneme recognition, the best performance, with 7.58% PER, was achieved using the CenterNet. These results show the effectiveness of using object detection techniques in phoneme recognition tasks. Furthermore, based on the findings of this study, speech processing tasks may be treated as object detection tasks.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords