IEEE Access (Jan 2023)

A Modular Deep Learning Architecture for Voice Pathology Classification

  • Ioanna Miliaresi,
  • Aggelos Pikrakis

DOI
https://doi.org/10.1109/ACCESS.2023.3300795
Journal volume & issue
Vol. 11
pp. 80465 – 80478

Abstract

Read online

The development of methods that combine different sources of information for medical diagnosis is an essential challenge in the field of medical informatics. In this context, we introduce a machine-learning framework for automatic voice pathology classification and, in particular, a modular deep learning architecture that classifies voice signals stemming from four types of voice disorders. To this end, we design a multimodal deep learning architecture that fuses medical metadata with voice signals. Our classifier is a combination of fully convolutional and feed-forward sub-networks that simultaneously process low-level and mid-level features which are extracted from acoustic signals of varying duration and medical records, respectively. A key objective of our study is to develop an architecture that is capable of processing voice samples of varying duration, to enhance the system’s learning and inference capabilities. Our research also focuses on overcoming performance limitations of neural networks that stem from the lack of extensive volumes of training data. We therefore, investigate problem-specific augmentation techniques based on the feature sequence segmentation and coloured noise injection and we show that the proposed method gives state-of-the-art results, achieving 64.4% classification accuracy, compared to the 63.5% classification score of the best performing method of the 2019 FEMH data challenge.

Keywords