Intensif: Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi (Aug 2023)
Comparison of Modified Nazief&Adriani and Modified Enhanced Confix Stripping algorithms for Madurese Language Stemming
Abstract
The Madurese language has a unique morphology. The morphological uniqueness can be used to find basic words. The basic word process is called stemming. Stemming can be developed into an application for translating Madurese into Indonesian and even other languages. It can support the development of a Madurese language text plagiarism system. Stemming research on the Madurese language is still rare. Therefore, this study aims to find the basic words of the Madurese language using modifications to the Nazief & Adriani algorithm and Enhanced Confix Stripping (ECS) modifications. The study used 1000 Madurese words, consisting of 630 prefix words, 74 ending words, and 296 confix words. The results showed that the modification of the Nazief & Adriani algorithm was better, shown by the accuracy obtained of 88.8% with overstemming of 0.7% and understemming of 10.5%. As for ECS, an accuracy of 74.0% was obtained, 0.4% overstemming, and 25.6% understemming. In the same process, Nazief&Adriani's modification is faster than the ECS modification. For the Nazief&Adriani modification, it takes 13.31 seconds while for the ECS modification, it takes 210.88.
Keywords