Измерение, мониторинг, управление, контроль (Dec 2022)

NOVEL APPROACH BASED ON TIME-FREQUENCY ANALYSIS FOR SEGMENTATION OF SPEECH SIGNALS

  • A.K. Alimuradov,
  • A.Yu. Tychkov,
  • P.P. Churakov,
  • D.S. Dudnikov

DOI
https://doi.org/10.21685/2307-5538-2022-4-11
Journal volume & issue
no. 4

Abstract

Read online

Background. The accuracy of speech signal segmentation depends directly on the parameters used to determine the boundaries of the beginning and the end of informative fragments in a continuous speech stream. The purpose of the work is to increase the efficiency of speech/pause segmentation due to the frequency-time analysis of speech signals. The research object is the parameters that describe speech characteristics in the frequency and time domains. The research subject is the relevance of the informative parameters of speech signals to the task of speech/pause segmentation. Materials and methods. The methods of short-term analysis of spectral and energy characteristics of speech based on the discrete Fourier transform and the energy Teager operator were used in the work. Software implementation of the proposed method was performed in ©MATLAB mathematical modeling environment produced by Math- Works Results. A novel original approach to speech/pause segmentation based on the analysis of the values of the mean frequency (in the frequency domain) and short-term energy of the Teager operator function (in the time domain) is proposed. The proposed approach is unique due to an auxiliary algorithm to correct speech/pause segmentation errors, developed on the basis of physiological functioning of the respiratory apparatus organs during the formation of a continuous speech stream. A brief overview of speech signal informative parameters used for speech/pause segmentation has been presented, and the proposed approach performance has been detailed. The suggested approach has been compared with the known methods of speech/pause segmentation for pure and noisy speech signals. Conclusions. The research findings have evidenced the best results of speech/pause segmentation for pure and noisy speech signals being achieved by the methods based on the proposed approach; the ratio of the short-term energy of the Teager operator function to the mean frequency as an informative parameter ensuring maximum relevance to the segmentation problem; an auxiliary algorithm to correct false states enhancing the efficiency of segmentation.

Keywords