Vietnam Journal of Computer Science (May 2020)
Simultaneous Removal of Prefix and Suffix
Abstract
This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.
Keywords