Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters

Jaemin Song; Hyunbum Kim; Yong Oh Lee

doi:10.1016/j.heliyon.2024.e40748

Heliyon (Dec 2024)

Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters

Jaemin Song,
Hyunbum Kim,
Yong Oh Lee

Affiliations

Jaemin Song: Department of Industrial and Data Engineering, Hongik University, Seoul, South Korea
Hyunbum Kim: Department of Otolaryngology-Head and Neck Surgery, The Catholic University of Korea, Seoul, South Korea
Yong Oh Lee: Department of Industrial and Data Engineering, Hongik University, Seoul, South Korea; Corresponding author. 94, Wausan-ro, Mapo-gu, Seoul, 04066, South Korea.

DOI: https://doi.org/10.1016/j.heliyon.2024.e40748
Journal volume & issue: Vol. 10, no. 24
p. e40748

Abstract

Read online

Introduction: Laryngeal cancer diagnosis relies on specialist examinations, but non-invasive methods using voice data are emerging with artificial intelligence (AI) advancements. Mel Frequency Cepstral Coefficients (MFCCs) are widely used for voice analysis, but Octave Frequency Spectrum Energy (OFSE) may offer better accuracy in detecting subtle voice changes. Problem statement: Accurate early diagnosis of laryngeal cancer through voice data is challenging with current methods like MFCC. Objectives: This study compares the effectiveness of MFCC and OFSE in classifying voice data into healthy, laryngeal cancer, benign mucosal disease, and vocal fold paralysis categories. Methods: Voice samples from 363 patients were analyzed using CNN models, employing MFCC and OFSE with 1/3 octave band filters. Grad-Class Activation Mapping (Grad-CAM) was used to visualize key voice features. Results: OFSE with 1/3 octave band filters outperformed MFCC in classification accuracy, especially in multi-class classification including laryngeal cancer, benign mucosal disease, and vocal fold paralysis groups (0.9398 ± 0.0232 vs. 0.7061 ± 0.0561). Grad-CAM analysis revealed that OFSE with 1/3 octave band filters effectively distinguished laryngeal cancer from healthy voices by focusing on increased noise in the over-formant area and changes in the fundamental frequency. The analysis also highlighted that specific narrow frequency areas, particularly in vocal fold paralysis, were critical for classification, and benign mucosal diseases occasionally resembled healthy voices, making AI differentiation between benign conditions and laryngeal cancer a significant challenge. Conclusion: OFSE with 1/3 octave band filters provides superior accuracy in diagnosing laryngeal diseases including laryngeal cancer, showing potential for non-invasive, AI-driven early detection.

Published in Heliyon

ISSN: 2405-8440 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General); Social Sciences: Social sciences (General)
Website: https://www.cell.com/heliyon/home

About the journal

Abstract

Keywords