Cybernetics and Information Technologies (Dec 2016)

Chinese Text Auto-Categorization on Petro-Chemical Industrial Processes

  • Ni Jing,
  • Gao Ge,
  • Chen Pengyu

DOI
https://doi.org/10.1515/cait-2016-0078
Journal volume & issue
Vol. 16, no. 6
pp. 69 – 82

Abstract

Read online

There is a huge growth in the amount of documents of corporations in recent years. With this paper we aim to improve classification performance and to support the effective management of massive technical material in the domain-specific field. Taking the field of petro-chemical process as a case, we study in detail the influence of parameters on classification accuracy when using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) Text auto-classification algorithm. Advantages and disadvantages of the two text classification algorithms are presented in the field of petro-chemical processes. Our tests also show that more attention to the professional vocabulary can significantly improve the F1 value of the two algorithms. These results have reference value for the future information classification in related industry fields.

Keywords