Измерение, мониторинг, управление, контроль (Jun 2022)

EMD-BASED TECHNIQUE FOR SPEECH SIGNAL PROCESSING

  • A.K. Alimuradov,
  • A.Yu. Tychkov,
  • P.P. Churakov,
  • A.V. Baranova,
  • D.S. Dudnikov

DOI
https://doi.org/10.21685/2307-5538-2022-2-10
Journal volume & issue
no. 2

Abstract

Read online

Background. Being selected and validated, the optimal set of speech signal informative parameters depends on the used processing methods and accurate evaluation of the results obtained. The purpose of the work is to enhance the efficiency of speech signal processing by expanding the space for informatively significant amplitude, time, frequency, and energy speech characteristics via the use of adaptive time-frequency analysis methods. Materials and methods. A unique technology for decomposing non-stationary data into frequency components, namely, empirical mode decomposition, when no a priori information regarding the analyzed signal is needed, has been used. The software implementation of the method has been performed in ©MATLAB (The MathWorks, Inc.) mathematical modeling environment. Results. A technique for speech signal processing based on the empirical mode decomposition has been developed. The proposed technique is based on the uniform splitting of the original speech signal into fragments, the empirical mode decomposition of the fragments, and the formation of mode speech signals. The technique has been investigated, and the following obtained results have been analyzed: the number of empirical modes, the difference between the original and reconstructed signals, and the time duration of mode speech signal formation. Conclusions. Based on the obtained research results, it has been revealed that the developed technique actually provides an expansion of the space for informatively significant characteristics due to the formation of a set of new mode speech signals with a minimum error. The necessary and sufficient difference between the original and reconstructed signals of less than 0.001 V has been provided. The developed technique can be efficiently used in the formation of an optimal set of speech parameters for detecting and classifying naturally expressed human psycho-emotional states.

Keywords