IEEE Access (Jan 2021)

Voice Pathology Detection and Classification by Adopting Online Sequential Extreme Learning Machine

  • Fahad Taha Al-Dhief,
  • Marina Mat Baki,
  • Nurul Mu'azzah Abdul Latiff,
  • Nik Noordini Nik Abd. Malik,
  • Naseer Sabri Salim,
  • Musatafa Abbas Abbood Albader,
  • Nor Muzlifah Mahyuddin,
  • Mazin Abed Mohammed

DOI
https://doi.org/10.1109/ACCESS.2021.3082565
Journal volume & issue
Vol. 9
pp. 77293 – 77306

Abstract

Read online

In the last decade, the implementation of machine learning algorithms in the analysis of voice disorder is paramount in order to provide a non-invasive voice pathology detection by only using audio signal. In spite of that, most recent systems of voice pathology work on a limited acoustic database. In other words, the systems use one vowel, such as /a/, and ignore sentences and other vowels when analyzing the audio signal. Other key issues that should be considered in the systems are accuracy and time consumption of an algorithm. Online Sequential Extreme Learning Machine (OSELM) is one of the machine learning algorithms that can be regarded as a rapid and accurate algorithm in the classification process. Therefore, this paper presents a voice pathology detection and classification system by using OSELM algorithm as a classifier, and Mel-frequency cepstral coefficient (MFCC) as a featured extraction. In this work, the voice samples were taken from the Saarbrücken voice database (SVD). This system involves two parts of the database; the first part includes all voices in SVD with sentences and vowels /a/, /i/, and /u/, which are uttered in high, low, and normal pitches; and the second part utilizes voice samples of the common three types of pathologies (cyst, polyp, and paralysis) based on the vowel /a/ that is produced in normal pitch. The experimental results have shown that OSELM was able to achieve the highest accuracy up to 91.17%, 94% of precision, and 91% of recall. Furthermore, OSELM obtained 87%, 87.55%, and 97.67% for f-measure, G-mean, and specificity, respectively. The proposed system also presents a high ability to achieve detection and classification results in real-time clinical applications.

Keywords