A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis

Klangjai Tammanam; Nuttachot Promrit; Sajjaporn Waijanya

Engineering and Applied Science Research (Jul 2021)

A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis

Klangjai Tammanam,
Nuttachot Promrit,
Sajjaporn Waijanya

Affiliations

Klangjai Tammanam
Nuttachot Promrit
Sajjaporn Waijanya

Journal volume & issue: Vol. 48, no. 5
pp. 614 – 626

Abstract

Read online

Pali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbouring words are changed and merged. Pali Sandhi word segmentation is more challenging than Thai word segmentation because Pali is a highly inflected language. This study proposes a novel approach that predicts splitting locations by classifying the sample Sandhi words into five classes with a bidirectional long short-term memory model. We applied the classified rules to rectify the words from the splitting locations. We identified 6,345 Pali Sandhi words from Dhammapada Atthakatha. We evaluated the performance of our proposed model on the basis of the accuracy of the splitting locations and compared the results with the dataset. Results showed that 92.20% of the splitting locations were correct, 1.10% of the Pali Sandhi words were predicted as non-splitting location words and 5.83% were not matched with the answers (incomplete segmentation).

Published in Engineering and Applied Science Research

ISSN: 2539-6161 (Print); 2539-6218 (Online)
Publisher: Khon Kaen University
Country of publisher: Thailand
LCC subjects: Technology: Technology (General)
Website: https://www.tci-thaijo.org/index.php/easr/index

About the journal

Abstract

Keywords