A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection

Vyom Verma; Anish Benjwal; Amit Chhabra; Sunil K. Singh; Sudhakar Kumar; Brij B. Gupta; Varsha Arya; Kwok Tai Chui

doi:10.1038/s41598-023-49869-6

Scientific Reports (Dec 2023)

A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection

Vyom Verma,
Anish Benjwal,
Amit Chhabra,
Sunil K. Singh,
Sudhakar Kumar,
Brij B. Gupta,
Varsha Arya,
Kwok Tai Chui

Affiliations

Vyom Verma: Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology
Anish Benjwal: Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology
Amit Chhabra: Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology
Sunil K. Singh: Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology
Sudhakar Kumar: Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology
Brij B. Gupta: Department of Computer Science and Information Engineering, Asia University
Varsha Arya: Department of Electrical and Computer Engineering, Lebanese American University
Kwok Tai Chui: Department of Electronic Engineering and Computer Science, School of Science and Technology, Hong Kong Metropolitan University (HKMU)

DOI: https://doi.org/10.1038/s41598-023-49869-6
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Voice is an essential component of human communication, serving as a fundamental medium for expressing thoughts, emotions, and ideas. Disruptions in vocal fold vibratory patterns can lead to voice disorders, which can have a profound impact on interpersonal interactions. Early detection of voice disorders is crucial for improving voice health and quality of life. This research proposes a novel methodology called VDDMFS [voice disorder detection using MFCC (Mel-frequency cepstral coefficients), fundamental frequency and spectral centroid] which combines an artificial neural network (ANN) trained on acoustic attributes and a long short-term memory (LSTM) model trained on MFCC attributes. Subsequently, the probabilities generated by both the ANN and LSTM models are stacked and used as input for XGBoost, which detects whether a voice is disordered or not, resulting in more accurate voice disorder detection. This approach achieved promising results, with an accuracy of 95.67%, sensitivity of 95.36%, specificity of 96.49% and f1 score of 96.9%, outperforming existing techniques.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal