MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Qin Li; Yuze Yang; Tianxiang Lan; Huifeng Zhu; Qi Wei; Fei Qiao; Xinjun Liu; Huazhong Yang

doi:10.1109/ACCESS.2020.2979799

IEEE Access (Jan 2020)

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Qin Li,
Yuze Yang,
Tianxiang Lan,
Huifeng Zhu,
Qi Wei,
Fei Qiao,
Xinjun Liu,
Huazhong Yang

Affiliations

Qin Li: Department of Electronic Engineering, Tsinghua University, Beijing, China
Yuze Yang: Department of Electronic Engineering, Tsinghua University, Beijing, China
Tianxiang Lan: Department of Electronic Engineering, Tsinghua University, Beijing, China
Huifeng Zhu: Department of Electronic Engineering, Tsinghua University, Beijing, China
Qi Wei: Department of Electronic Engineering, Tsinghua University, Beijing, China
Fei Qiao: ORCiD; Department of Electronic Engineering, Tsinghua University, Beijing, China
Xinjun Liu: Department of Mechanical Engineering, Tsinghua University, Beijing, China
Huazhong Yang: Department of Electronic Engineering, Tsinghua University, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2020.2979799
Journal volume & issue: Vol. 8
pp. 48720 – 48730

Abstract

Read online

Feature extraction is an essential part of automatic speech recognition (ASR) to compress raw speech data and enhance features, where conventional implementation methods based on the digital domain have encountered energy consumption and processing speed bottlenecks. Thus, we propose a Mixed-Signal Processing (MSP) architecture to efficiently extract Mel-Frequency Cepstrum Coefficients (MFCC) features. We design MSP-MFCC to pre-process speech signals in the analog domain, which significantly reduces the cost of the analog-to-digital converter (ADC), as well as the computational complexity of the digital back-end. Moreover, MSP-MFCC eliminates the time-consuming Fourier transform in the conventional digital realization by improving processing flow. We fabricated the analog part based on 180nm CMOS mixed-signal technology, then measured the chip. The measured results show the energy consumption of MSP-MFCC is $0.72~\mu \text{J}$ /frame, and the processing speed is up to $45.79~\mu \text{s}$ /frame. MSP-MFCC achieves 95% energy saving and about $6.4\times $ speedup than state of the art. Further, by using the features extracted by MSP-MFCC, speech recognition simulation reaches the accuracy of 98.2%, which also keeps the leading performance to its current counterparts. The proposed MFCC extractor is competitive for integration in the ultra-low-power always-on wearable speech recognition applications.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords