Sintagma (Nov 2018)
La contribución de los métodos de aprendizaje automático no supervisado al diseño de métodos para la clasificación textual según el grado de especialización
Abstract
Modern terminology theories are based on the hypothesis of the existence of a text specialization degree that depends on different elements, both linguistic and extralinguistic. This article aims to test how useful unsupervised machine learning algorithms (specifically simple k-means algorithm) are to classify texts according to its specialization degree. To that end, a database with intra and extra textual information is used as a source tool. Results are compared with the class tags previously assigned by means of a numerical classification method. The obtained results suggest the existence of the degree and prove the presence of particular texts that are placed in limits between classes. This fact reveals the existence of vague limits and problems in the proposed method.