AUTOMATIC IDENTIFICATION OF DYSPHONIAS USING MACHINE LEARNING ALGORITHMS

Miguel Angel BELLO RIVERA; Carlos Alberto REYES GARCÍA; Tania Cristal TALAVERA ROJAS; Perfecto Malaquías QUINTERO FLORES; Rodolfo Eleazar PÉREZ LOAIZA

doi:10.35784/acs-2023-32

Applied Computer Science (Dec 2023)

AUTOMATIC IDENTIFICATION OF DYSPHONIAS USING MACHINE LEARNING ALGORITHMS

Miguel Angel BELLO RIVERA,
Carlos Alberto REYES GARCÍA,
Tania Cristal TALAVERA ROJAS,
Perfecto Malaquías QUINTERO FLORES,
Rodolfo Eleazar PÉREZ LOAIZA

Affiliations

Miguel Angel BELLO RIVERA: ORCiD; Tecnológico Nacional de México
Carlos Alberto REYES GARCÍA: National Institute of Astrophysics, Optics, and Electronics (INAOE)
Tania Cristal TALAVERA ROJAS: ORCiD; La Universidad Autónoma de Asunción (UAA)
Perfecto Malaquías QUINTERO FLORES: ORCiD; El Tecnológico Nacional de México/Instituto Tecnológico de Apizaco
Rodolfo Eleazar PÉREZ LOAIZA: ORCiD; El Tecnológico Nacional de México/Instituto Tecnológico de Apizaco

DOI: https://doi.org/10.35784/acs-2023-32
Journal volume & issue: Vol. 19, no. 4

Abstract

Read online

Dysphonia is a prevalent symptom of some respiratory diseases that affects voice quality, even for prolonged periods. For its diagnosis, speech-language pathologists make use of different acoustic parameters to perform objective evaluations on patients and determine the type of dysphonia that affects them, such as hyperfunctional and hypofunctional dysphonia, which is important because each type requires a different treatment. In the field of artificial intelligence this problem has been addressed through the use of acoustic parameters that are used as input data to train machine learning and deep learning models. However, its purpose is usually to identify whether a patient is ill or not, making binary classifications between healthy voices and voices with dysphonia, but not between dysphonias. In this paper, harmonic-to-noise ratio, cepstral peak prominence-smoothed, zero crossing rate and the means of the Mel frequency cepstral coefficients (2-19) are used to make multiclass classification of voices with euphony, hyperfunction and hypofunction by means of six machine learning algorithms, which are: Random Forest, K nearest neighbors, Logistic regression, Decision trees, Support vector machines and Naive Bayes. In order to evaluate which of them presents a better performance to identify the three voice classes, bootstrap.632 was used. It is concluded that the best confidence interval ranges from 87% to 92%, in terms of accuracy for the K Nearest Neighbors model. Results can be implemented in the development of a complementary application for the clinical diagnosis or monitoring of a patient under the supervision of a specialist.

Published in Applied Computer Science

ISSN: 1895-3735 (Print); 2353-6977 (Online)
Publisher: Polish Association for Knowledge Promotion
Country of publisher: Poland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.acs.pollub.pl/

About the journal

Abstract

Keywords