SoftwareX (May 2024)

Morpheme-based Korean text cohesion analyzer

  • Dong-Hyun Kim,
  • Seokho Ahn,
  • Euijong Lee,
  • Young-Duk Seo

Journal volume & issue
Vol. 26
p. 101659

Abstract

Read online

The fundamental difference between Korean and English text analysis lies in morpheme analysis. While existing Korean text analysis relies on English analysis tools, it often yields inaccurate results due to the difficulty of morpheme analysis. The primary reason is the existing morpheme analyzer depends on eojeol tokens, making it challenging to grasp Korean characteristics. Therefore, we introduce a Transformer-based morpheme analyzer that uses morpheme tokens to capture the inherent feature in Korean sentences. Then, we successfully integrate this morpheme analyzer into our Korean text analysis tool, offering it as a web service for efficient usage.

Keywords