Jurnal Teknologi Informasi dan Ilmu Komputer (Aug 2020)

Prediksi Jeda dalam Ucapan Kalimat Bahasa Melayu Pontianak Menggunakan Hidden Markov Model Berbasis Part of Speech

  • Arif Bijaksana Putra Negara,
  • Hafiz Muhardi,
  • Evi Fathiyah Muniyati

Journal volume & issue
Vol. 7, no. 4
pp. 755 – 764

Abstract

Read online

Informasi jeda adalah salah satu faktor pendukung dari ucapan berkualitas yang dihasilkan oleh sistem Text to Speech. Penelitian ini bertujuan untuk memprediksi jeda pada ucapan kalimat bahasa Melayu Pontianak berbasis part of speech dengan menggunakan tools Hidden Markov Model (HMM). HMM akan menghitung nilai probabilitas dari setiap kemungkinan yang ada. Penelitian ini menggunakan data berupa file rekaman ucapan penutur yang membacakan 500 kalimat berbahasa Melayu Pontianak. Hasil yang didapatkan dari sistem ini yaitu teks kalimat bahasa Melayu Pontianak beserta prediksi jedanya. Indeks jeda dikategorikan menjadi 5 kategori yaitu indeks jeda “0” menandakan tidak ada jeda, “1” menandakan jeda singkat, “2” menandakan jeda panjang, “,” menandakan tanda baca koma, dan “.” menandakan akhir kalimat. Hasil prediksi kemudian diuji menggunakan pengujian akurasi kecocokan jeda ucapan dalam satu kalimat penuh dan pengujian precision, recall dan f-measure. Frasa jeda ucapan yang diuji yaitu frasa jeda 1+2 dan frasa jeda 2. Pengujian dilakukan dengan membandingkan hasil model bigram dan trigram. Berdasarkan pengujian yang telah dilakukan, model trigram lebih baik dalam menghasilkan prediksi jeda ucapan pada kalimat bahasa Melayu Pontianak. Abstract Pause information is one of the supporting factors of quality speech produced by the Text to Speech system. Previously there had been research to predict pauses in Pontianak Malay language using other methods, but it still did not get good results. This study aims to predict pauses in Pontianak Malay language sentences using the Hidden Markov Model (HMM) tools based on part of speech. HMM will calculate the probability value of each possibility. This research uses recording file of speeches from speakers who read 500 Pontianak Malay sentences and a new PoS set developed from several existing PoS sets. The results are Pontianak Malay language sentence along with the pause prediction. The pause indices are categorized into 5 categories, the pause index "0" indicates that there is no pause, "1" indicates a short pause, "2" indicates a long pause, "," indicates the comma punctuation, and "." indicates the end of the sentence. The prediction results are then tested using a speech pause match accuracy test in one full sentence and testing of precision, recall and f-measure. The speech pause phrases that are tested are the pause phrase 1+2 and the pause phrase 2. The test is done by comparing the results of the bigram and trigram models. Based on the tests that have been done, the trigram model is better at producing predictions of speech pauses in Pontianak Malay language sentences.