IEEE Access (Jan 2024)

On the Acoustic-Based Recognition of Multiple Objects Using Overlapped Impact Sounds

  • Van-Thuan Tran,
  • Wei-Ho Tsai

DOI
https://doi.org/10.1109/ACCESS.2024.3459423
Journal volume & issue
Vol. 12
pp. 135651 – 135666

Abstract

Read online

In a recent study, we have uncovered the potential of utilizing impact sounds generated when objects fall freely and strike a plane for acoustic-based object recognition (AOR). Building upon this discovery, we address the practical scenario where multiple objects fall simultaneously, leading to overlapped impact sounds. This investigation marks the inception of our work on recognizing multiple objects using overlapped impact sounds, termed acoustic-based multiple object recognition (AMOR). To tackle the challenges posed by overlapped sounds in AOR, we propose a novel two-branch network named OANet, designed with overlap awareness. This network integrates transformer encoding blocks, with the first branch pre-trained using the supervised contrastive learning approach to predict the number of objects. The outputs of the first and second branches are combined and fed into a classifier for the AMOR task. We conducted experiments and evaluations on a self-collected dataset of 10 distinct LEGO objects, including instances where a maximum of three objects fell and hit the designated plane almost concurrently. This dataset contains both non-overlapped and overlapped samples across 175 classes or object combinations (i.e., 10 individual objects, 45 combinations of two objects, and 120 combinations of three objects). Experimental results underscore the viability and promising outcomes of AMOR, showcasing the potential for accurate recognition in complex scenarios. Notably, OANet exhibits superior performance and robustness compared to baseline networks without overlap awareness. Furthermore, upon investigating AMOR in more complex scenarios with mixed overlapped datasets containing samples of a maximum of 5, 7, and 10 objects, we observed the remarkable efficiency of the proposed OANet. It consistently achieved accuracies surpassing 91%, which were 14-24% higher than the results obtained by the baseline models. Additionally, other important findings include the effectiveness of feature combination, the supervised contrastive learning approach for feature learning in the overlap detection network, and the use of mixed data during the training phase.

Keywords