Analisis Morfologi untuk Menangani Out-of-Vocabulary Words pada Part-of-Speech Tagger Bahasa Indonesia Menggunakan Hidden Markov Model

Febyana Ramadhanti; Yudi Wibisono; Rosa Ariani Sukamto

doi:10.26418/jlk.v2i1.13

Jurnal Linguistik Komputasional (Mar 2019)

Analisis Morfologi untuk Menangani Out-of-Vocabulary Words pada Part-of-Speech Tagger Bahasa Indonesia Menggunakan Hidden Markov Model

Febyana Ramadhanti,
Yudi Wibisono,
Rosa Ariani Sukamto

Affiliations

Febyana Ramadhanti
Yudi Wibisono
Rosa Ariani Sukamto

DOI: https://doi.org/10.26418/jlk.v2i1.13
Journal volume & issue: Vol. 2, no. 1
pp. 6 – 12

Abstract

Read online

Part-of-speech (PoS) tagger is one of tasks in the field of natural language processing (NLP) as the process of part-of-speech tagging for each word in the inputed sentence. Hidden markov model (HMM) is a probabilistic based PoS tagger algorithm, so it really depends on the train corpus. The limited components in the train corpus and the breadth of words in the Indonesian language pose a problem called out-of-vocabulary (OOV) words. This research compared PoS tagger HMM using Morphological Analysis (AM) method and HMM PoS tagger without AM, using the same train and testing corpus. Testing corpus contains 30% OOV level out of 6,676 tokens or 740 sentences. The result obtained from the HMM system has 97.54% of accuracy, while the HMM system with morphological analysis method has 99.14% as it’s highest accuracy.

Published in Jurnal Linguistik Komputasional

ISSN: 2621-9336 (Online)
Publisher: Indonesia Association of Computational Linguistics (INACL)
Country of publisher: Indonesia
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: http://inacl.id/journal/index.php/jlk

About the journal