Eesti Rakenduslingvistika Ühingu Aastaraamat (May 2017)

Varieeruva vältega sõnade hääldusuuringud kõnesünteesi teenistuses

  • Liisi Piits,
  • Mari-Liis Kalvik

DOI
https://doi.org/10.5128/ERYa13.08
Journal volume & issue
Vol. 13
pp. 123 – 140

Abstract

Read online

"Words of variable quantity degrees as a problem for speech synthesis" Estonian text-to-speech synthesis relies in its determination of pronunciation on the Dictionary of Standard Estonian (ÕS 2013), which is the basis of standard Estonian. However, for roughly 300 words, this dictionary allows pronunciation with both the second and third quantity degree. This causes problems in the text-to-speech synthesis system, since the automatic text analysis cannot handle multiple outputs. It is necessary to give preference to one of the pronunciation variants in the text analysis process, and therefore it is important to identify which variant is more common among language users in actual speech. For the studies of quantity degrees, words were chosen which ÕS 2013 lists as being pronounced with both the second and third quantity degree. This study is based on a reading experiment conducted with 23 informants (15 women and 8 men), in which each informant read 52 sentences aloud. These sentences contained 47 target words, i.e. words of variable quantity degrees; in total, the study yielded 1080 pronunciation instances to examine. The group of target words includes those with varying vowel quantity degrees as well as those with varying consonant quantity degrees; the duration ratios characteristic of each quantity degree were calculated on the basis of the primary-stress syllable and the unstressed syllable following it. The average duration ratio for the second quantity degree is 1.8, and for the third quantity degree 2.9. These results are similar to those obtained in previous studies. On the basis of the informants’ pronunciation, the words were grouped into three categories: second quantity degree, variable quantity degree (where neither the second nor the third quantity degree accounted for more than 2/3 of all pronunciations) and third quantity degree. Based on the duration ratio, 8 words fell into the second quantity degree group; however, based on auditory assessment, this group increased to 17 words. The variable quantity degree group contained 15 words based on duration ratios, but only 5 words based on auditory assessment. The third quantity degree group contained 24 words by duration ratio and 25 words by auditory assessment. Finding trends in word structure among words in the same quantity degree groups would make it possible to draw inferences about other words of the same type as well, which would increase the applied value of the study. Generally, though words of the same syllable structure and part of speech did not exhibit the same pronunciation patterns. However, it can at least be stated that the third quantity degree dominated among both two- and three-syllable adjectives formed with the suffix -lik. Of the 15 such words analysed in the study, only two were pronounced predominantly with the second quantity degree. Artiklis tutvustame lugemiseksperimenti, mille põhjal uurime nn varieeruva vältega sõnade hääldust. Varieeruva vältega sõnade määratlemisel lähtume õigekeelsussõnaraamatu (ÕS 2013) normingutest – uurime sõnu, mida lubatakse hääldada nii teises kui kolmandas vältes. Analüüsime sõnade pearõhulise ja järgsilbi kestussuhteid ja võrdleme saadud andmeid kuuldelise hinnangu tulemustega. Uurime, kas sarnase silbistruktuuri ja sama sõnaliigilise kuuluvusega sõnade häälduses on sarnaseid jooni. Rakenduslikust aspektist kannus- tab uurimust vajadus leida lahendus probleemidele, mida varieeruvus tekitab tekst-kõne sünteesi protsessis. Uuringu tulemusel selgusid peamised trendid varieeruva vältega sõnade häälduseelistustes. Nii silpide kestussuhete kui kuuldelise hinnangu alusel moodustusid kindlad sõnarühmad, kus domineeris üks või teine välde. Sõnatüüpide kaupa analüüs võimaldas määrata ka välte varieerumise trende tüübiti, nt kõik kolmesilbilised lik-liitelised adjektiivid ühe erandiga hääldusid kolmandas vältes.

Keywords