SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification

Jongpil Lee; Jiyoung Park; Keunhyoung Luke Kim; Juhan Nam

doi:10.3390/app8010150

Applied Sciences (Jan 2018)

SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification

Jongpil Lee,
Jiyoung Park,
Keunhyoung Luke Kim,
Juhan Nam

Affiliations

Jongpil Lee: Graduate School of Culture Technology, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea
Jiyoung Park: Graduate School of Culture Technology, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea
Keunhyoung Luke Kim: Graduate School of Culture Technology, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea
Juhan Nam: Graduate School of Culture Technology, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea

DOI: https://doi.org/10.3390/app8010150
Journal volume & issue: Vol. 8, no. 1
p. 150

Abstract

Read online

Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for different modalities of raw data in an end-to-end fashion. In the audio domain, a raw waveform-based approach has been explored to directly learn hierarchical characteristics of audio. However, the majority of previous studies have limited their model capacity by taking a frame-level structure similar to short-time Fourier transforms. We previously proposed a CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations. The architecture showed comparable performance to the spectrogram-based CNN model in music auto-tagging. In this paper, we extend the previous work in three ways. First, considering the sample-level model requires much longer training time, we progressively downsample the input signals and examine how it affects the performance. Second, we extend the model using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks. Finally, we visualize filters learned by the sample-level CNN in each layer to identify hierarchically learned features and show that they are sensitive to log-scaled frequency.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords