Environmental Sound Recognition Using Time-Frequency Intersection Patterns

Xuan Guo; Yoshiyuki Toyoda; Huankang Li; Jie Huang; Shuxue Ding; Yong Liu

doi:10.1155/2012/650818

Applied Computational Intelligence and Soft Computing (Jan 2012)

Environmental Sound Recognition Using Time-Frequency Intersection Patterns

Xuan Guo,
Yoshiyuki Toyoda,
Huankang Li,
Jie Huang,
Shuxue Ding,
Yong Liu

Affiliations

Xuan Guo: Graduate Department of Computer and Information Systems, Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan
Yoshiyuki Toyoda: Graduate Department of Computer and Information Systems, Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan
Huankang Li: Department of Computer Science and Engineering, Shanghai Jiaotong University, 200240 Shanghai, China
Jie Huang: Graduate Department of Computer and Information Systems, Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan
Shuxue Ding: Graduate Department of Computer and Information Systems, Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan
Yong Liu: Graduate Department of Computer and Information Systems, Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan

DOI: https://doi.org/10.1155/2012/650818
Journal volume & issue: Vol. 2012

Abstract

Read online

Environmental sound recognition is an important function of robots and intelligent computer systems. In this research, we use a multistage perceptron neural network system for environmental sound recognition. The input data is a combination of time-variance pattern of instantaneous powers and frequency-variance pattern with instantaneous spectrum at the power peak, referred to as a time-frequency intersection pattern. Spectra of many environmental sounds change more slowly than those of speech or voice, so the intersectional time-frequency pattern will preserve the major features of environmental sounds but with drastically reduced data requirements. Two experiments were conducted using an original database and an open database created by the RWCP project. The recognition rate for 20 kinds of environmental sounds was 92%. The recognition rate of the new method was about 12% higher than methods using only an instantaneous spectrum. The results are also comparable with HMM-based methods, although those methods need to treat the time variance of an input vector series with more complicated computations.

Published in Applied Computational Intelligence and Soft Computing

ISSN: 1687-9724 (Print); 1687-9732 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/4795

About the journal