International Journal of Information and Communication Technology Research (Jun 2016)
Hybrid of Evolutionary and Swarm Intelligence Algorithms for Prosody Modeling in Natural Speech Synthesis
Abstract
To reduce the number of input features to a prosody generator in natural speech synthesis application, a hybrid of an evolutionary algorithm and a swarm intelligence-based algorithm is used for feature selection (FS) in this study. The input features to FS unit are word-level and syllable-level linguistic features. The word-level features include punctuation information, part-of-speech tags, semantic indicators, and length of the words. The syllable-level features include the phonemic structure and position indicator of the current syllable in a word. A modified Elman-type dynamic neural network (DNN) is used for prosody generation in this study. The output layer of this DNN provides prosody information at the syllable-level including pitch contour, log-energy level, duration information, and pause data. Simulation results show that the prosody information is predicted with an acceptable error by this hybrid soft-computing method as compared to Elman-type neural network prosody generator and binary gravitational search algorithm-based FS unit.