Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Dec 2023)

ANoM STEMMER: Nazief & Andriani Modification for Madurese Stemming

  • Enni Lindrawati,
  • Ema Utami,
  • Aiinul Yaqin

DOI
https://doi.org/10.29207/resti.v7i6.5086
Journal volume & issue
Vol. 7, no. 6
pp. 1341 – 1347

Abstract

Read online

Madurese is one of the regional languages ​​in Indonesia. This is a cultural property that needs to be preserved. With various uniqueness and word formation rules, the Madurese language can be used in information retrieval, namely stemming. The Madurese language has a close relationship with the Javanese language; in several studies, the stemming method is often used, such as the modification of the Nazief and Adriani method, which has good performance for the Javanese language, but there has never been any research on the Madurese language and it has not been proven successful. Previous studies also have not used morphophonemic rules that influence word formation in Madurese. Therefore, this research was developed by modifying Nazief and Adriani's algorithm for Madurese based on Madurese language morphology by removing affixes, namely ter-ater (prefix), panoteng (suffix), and morphophonemic rules. Corpus uses 1000 words from the Madurese language dictionary that have received affixes. The accuracy of the algorithm is 89% with 890 words that match; the prefix has an accuracy of 93.81%; the suffix has an accuracy of 83.78%; and the confix has an accuracy of 80.07%. As for the overall performance, it produces an accuracy of 89.0% with an error rate of 11%. Understemming is found in 104 words, and overstemming in 6 words. The time it takes to compile is 31.31 seconds.

Keywords