ComTech (Dec 2017)

Question Categorization using Lexical Feature in Opini.id

  • Christian Eka Saputra,
  • Derwin Suhartono,
  • Rini Wongso

DOI
https://doi.org/10.21512/comtech.v8i4.4026
Journal volume & issue
Vol. 8, no. 4
pp. 229 – 234

Abstract

Read online

This research aimed to categorize questions posted in Opini.id. N-gram and Bag of Concept (BOC) were used as the lexical features. Those were combined with Naïve Bayes, Support Vector Machine (SVM), and J48 Tree as the classification method. The experiments were done by using data from online media portal to categorize questions posted by user. Based on the experiments, the best accuracy is 96,5%. It is obtained by using the combination of Bigram Trigram Keyword (BTK) features with J48 Tree as classifier. Meanwhile, the combination of Unigram Bigram (UB) and Unigram Bigram Keyword (UBK) with attribute selection in WEKA achieves the accuracy of 95,94% by using SVM as the classifier.

Keywords