IEEE Access (Jan 2024)
Improved Laryngeal Pathology Detection Based on Bottleneck Convolutional Networks and MFCC
Abstract
Automatic detection of laryngeal disorders via voice analysis allows for early diagnosis. However, the effectiveness of AI-based detection methods is often limited, mainly due to insufficient training data subject to confidentiality constraints, as well as the wide range of pathologies, which hinders accurate detection. To address these issues, an automatic voice disorder detection (AVDD) system is proposed, employing an innovative AI-based feature extraction approach to improve detection performance. The approach, termed MFCC-CBN, employs Mel-frequency cepstral coefficients (MFCC) with a convolutional bottleneck network (CBN). It also integrates a diverse feature set, such as measurements related to the fundamental frequency (F0) perturbation, features specific to the glottal source, and conventional MFCC features. The proposed approach is validated through comprehensive experiments on the public database of the Príncipe de Asturias University Hospital (HUPA), which contains recordings of sustained vowels. The method is tested using various classifiers, including Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). The obtained results show that our method provides a high detection rate and maintains stable performance regardless of the classifier used, which reveals its good generalization. A 5-fold cross-validation technique is adopted for the performance evaluation of the AVDD system. The optimal feature configuration surpasses state-of-the-art results, achieving a classification accuracy of 88.79% and an F1-score of 0.88.
Keywords