Applied Mathematics and Nonlinear Sciences (Jan 2024)
Music genre classification using deep learning: a comparative analysis of CNNs and RNNs
Abstract
Music categorization as a key component of music information retrieval is an important means of dealing with massive amounts of music information. Based on this, this paper proposes the use of deep learning algorithms to automatically identify music genre factions in a classification method. After denoising the audio using wavelet thresholding, the method of combining audio characteristics and spectral image features is used to enhance the audio data, short-time Fourier transforms and Meier transforms are used to realize audio feature extraction, and then convolutional neural network and recurrent neural network are used as the classifiers respectively to construct the music genre classification model. The GTZAN dataset is used as an experimental object to explore the effects of data enhancement and feature extraction on the music classification effect and to compare and analyze the performance of the CNN model and the RNN model. When the data enhancement methods based on audio features and image features are used simultaneously, the music classification accuracy of the model is above 90%, and the model using a short-time Fourier transform to extract audio features has a better classification effect. The CNN model is 1.3% more effective than the RNN model in classifying music genres, and this result is still consistent even if the audio is processed with variable speed pitch change and structural change, so the CNN model has better music genre classification.
Keywords