Vietnam Journal of Computer Science (May 2020)

Simultaneous Removal of Prefix and Suffix

  • Pawan Tamta,
  • B. P. Pande

DOI
https://doi.org/10.1142/S2196888820500074
Journal volume & issue
Vol. 7, no. 2
pp. 129 – 144

Abstract

Read online

This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.

Keywords