Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks

Andrés Eduardo Castro-Ospina; Miguel Angel Solarte-Sanchez; Laura Stella Vega-Escobar; Claudia Isaza; Juan David Martínez-Vargas

doi:10.3390/s24072106

Sensors (Mar 2024)

Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks

Andrés Eduardo Castro-Ospina,
Miguel Angel Solarte-Sanchez,
Laura Stella Vega-Escobar,
Claudia Isaza,
Juan David Martínez-Vargas

Affiliations

Andrés Eduardo Castro-Ospina: Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia
Miguel Angel Solarte-Sanchez: Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia
Laura Stella Vega-Escobar: Grupo de Investigación Máquinas Inteligentes y Reconocimiento de Patrones, Instituto Tecnológico Metropolitano, Medellín 050013, Colombia
Claudia Isaza: SISTEMIC, Electronic Engineering Department, Universidad de Antioquia-UdeA, Medellín 050010, Colombia
Juan David Martínez-Vargas: GIDITIC, Universidad EAFIT, Medellín 050022, Colombia

DOI: https://doi.org/10.3390/s24072106
Journal volume & issue: Vol. 24, no. 7
p. 2106

Abstract

Read online

Sound classification plays a crucial role in enhancing the interpretation, analysis, and use of acoustic data, leading to a wide range of practical applications, of which environmental sound analysis is one of the most important. In this paper, we explore the representation of audio data as graphs in the context of sound classification. We propose a methodology that leverages pre-trained audio models to extract deep features from audio files, which are then employed as node information to build graphs. Subsequently, we train various graph neural networks (GNNs), specifically graph convolutional networks (GCNs), GraphSAGE, and graph attention networks (GATs), to solve multi-class audio classification problems. Our findings underscore the effectiveness of employing graphs to represent audio data. Moreover, they highlight the competitive performance of GNNs in sound classification endeavors, with the GAT model emerging as the top performer, achieving a mean accuracy of 83% in classifying environmental sounds and 91% in identifying the land cover of a site based on its audio recording. In conclusion, this study provides novel insights into the potential of graph representation learning techniques for analyzing audio data.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords