Fused Audio Instance and Representation for Respiratory Disease Detection

Tuan Truong; Matthias Lenga; Antoine Serrurier; Sadegh Mohammadi

doi:10.3390/s24196176

Sensors (Sep 2024)

Fused Audio Instance and Representation for Respiratory Disease Detection

Tuan Truong,
Matthias Lenga,
Antoine Serrurier,
Sadegh Mohammadi

Affiliations

Tuan Truong: Bayer AG, 13353 Berlin, Germany
Matthias Lenga: Bayer AG, 13353 Berlin, Germany
Antoine Serrurier: Clinic for Phoniatrics, Pedaudiology and Communication Disorders, University Hospital of RWTH Aachen, 52074 Aachen, Germany
Sadegh Mohammadi: Bayer AG, 13353 Berlin, Germany

DOI: https://doi.org/10.3390/s24196176
Journal volume & issue: Vol. 24, no. 19
p. 6176

Abstract

Read online

Audio-based classification techniques for body sounds have long been studied to aid in the diagnosis of respiratory diseases. While most research is centered on the use of coughs as the main acoustic biomarker, other body sounds also have the potential to detect respiratory diseases. Recent studies on the coronavirus disease 2019 (COVID-19) have suggested that breath and speech sounds, in addition to cough, correlate with the disease. Our study proposes fused audio instance and representation (FAIR) as a method for respiratory disease detection. FAIR relies on constructing a joint feature vector from various body sounds represented in waveform and spectrogram form. We conduct experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an area under the receiver operating characteristic curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. Compared to models trained solely on spectrograms or waveforms, the use of both representations results in an improved AUC score, demonstrating that combining spectrogram and waveform representation helps to enrich the extracted features and outperforms the models that use only one representation. While this study focuses on COVID-19, FAIR’s flexibility allows it to combine various multi-modal and multi-instance features in many other diagnostic applications, potentially leading to more accurate diagnoses across a wider range of diseases.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords