IEEE Access (Jan 2024)
StutterNet: Stuttering Disfluencies Detection in Synthetic Speech Signals via Mel Frequency Cepstral Coefficients Features Using Deep Learning
Abstract
Stuttering is a speech disorder characterised by the repetition, prolongation, or blocking of sounds, syllables, or words, which can cause significant social and emotional difficulties for those who experience it. To help find and diagnose stuttering early, it is important for clinicians and researchers to accurately separate stuttering from normal speech. This helps them understand the disorder, find possible causes, and come up with effective interventions and treatments. The UCLASS dataset was used in this study, and 40 MFCC features were taken out to see how well different machine learning classifiers could indicate the difference between normal and stuttering speech. The major problem is the UCLASS imbalanced dataset. The authors address it with the synthetic minority oversampling technique. After machine learning experimentation’s, we propose a novel hybrid model that performs better than individual machine learning. In hybrid, the lightweight SutterNet model takes the best features from the data and then make prediction. The results indicate that the evaluated classifiers showed varying levels of performance. Overall, the results suggest that hybrid classifiers have the potential to accurately classify normal and stuttering speech, which could have important implications for the early identification and diagnosis of stuttering, as well as the development of assistive technologies and effective interventions and treatments.
Keywords