Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective

Weichun Wong; Yachun Li; Shihan Li

doi:10.1109/ACCESS.2024.3475508

IEEE Access (Jan 2024)

Improving Audio Recognition With Randomized Area Ratio Patch Masking: A Data Augmentation Perspective

Weichun Wong,
Yachun Li,
Shihan Li

Affiliations

Weichun Wong: ORCiD; Department of Electrical and Computer Engineering, Tamkang University, New Taipei City, Taiwan
Yachun Li: Department of Electrical and Computer Engineering, Tamkang University, New Taipei City, Taiwan
Shihan Li: ORCiD; Department of Electrical and Computer Engineering, Tamkang University, New Taipei City, Taiwan

DOI: https://doi.org/10.1109/ACCESS.2024.3475508
Journal volume & issue: Vol. 12
pp. 172548 – 172561

Abstract

Read online

In audio recognition, improving the accuracy and generalizability of Pretrained Audio Neural Networks (PANNs) remains challenging. This study introduces Randomized Area Ratio Patch Masking (RARPM), a novel data augmentation technique that applies random patches with varying transparency to log mel spectrograms during training. This method aims to enhance model learning by diversifying training data, optimized for the MobileNetV1 architecture. The study uses the AudioSet dataset, comprising over two million labeled sound clips, to validate the effectiveness of RARPM. The results show that RARPM achieves a mean average precision (mAP) of 0.385, surpassing the baseline SpecAugment’s mAP of 0.366. This research contributes a new strategy for data augmentation, demonstrating significant improvements in audio recognition tasks and paving the way for more robust models applicable across diverse architectures.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords