วารสารวิทยาการสารสนเทศและเทคโนโลยีประยุกต์ (Apr 2024)

Thai Word Segmentation using a Replacing the English Alphabet Approach to Enhance Thai Text Sentiment Analysis

  • Sumonta Kasemvilas

DOI
https://doi.org/10.14456/jait.2024.10
Journal volume & issue
Vol. 6, no. 2
pp. 158 – 178

Abstract

Read online

Thai word segmentation is an important method used that is in several document analysis applications. Dictionary-based techniques are popular for Thai word segmentation because of their high accuracy. However, these techniques are prone to errors, especially when some words are not in the dictionary. A solution to this problem is to add more vocabulary to the dictionary. Moreover, traditional techniques cannot be applied to segment misspelled words. Therefore, this research proposes a new Thai word segmentation method that replaces Thai letters with English letters. Replacing the English alphabet (REA) is a novel approach for generating short English character sequences using various formats with the same Thai writing structures. This approach improves the accuracy of Thai word segmentation, thus increasing the accuracy of Thai text classification and sentiment analysis. An evaluation is performed using Thai social media messages and Thai post comments on Pantip. These datasets are labeled by their sentiments (positive, neutral, or negative). The performance of the REA approach with the TF-G and RF techniques is better than that of the other methods, and the experimental results may be acceptable upon comparison with those of earlier well-known studies.

Keywords