IEEE Access (Jan 2025)
Music Generation Using Deep Learning and Generative AI: A Systematic Review
Abstract
This paper presents a systematic review of recent advances in music generation using deep learning techniques, categorizing the latest research in the field and identifying key contributions from various approaches. The study examines common data representations in music generation, including raw waveforms, spectrograms, and MIDI, alongside the most prominent deep learning architectures like Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs), Variational Autoencoders (VAEs), and Transformer-based models. Through a comparative analysis, the paper highlights the strengths and limitations of these approaches. The findings suggest that GANs with spectrograms and RNNs with MIDI data are particularly effective for generating multi-track music, while autoregressive models like MusicGen and transformer models demonstrate superior performance in capturing long-term dependencies in music generation. Additionally, the paper underscores the emergence of diffusion models, which are gaining popularity for generating high-quality, complex music outputs. The major contribution of this review is the identification of the best-performing models for various music generation tasks and the provision of comprehensive insights into data representation methods, evaluation metrics, and future research directions.
Keywords