An Automatic Document Classifier System Based on Genetic Algorithm and Taxonomy

Alan Diaz-Manriquez; Ana Bertha Rios-Alvarado; Jose Hugo Barron-Zambrano; Tania Yukary Guerrero-Melendez; Juan Carlos Elizondo-Leal

doi:10.1109/ACCESS.2018.2815992

IEEE Access (Jan 2018)

An Automatic Document Classifier System Based on Genetic Algorithm and Taxonomy

Alan Diaz-Manriquez,
Ana Bertha Rios-Alvarado,
Jose Hugo Barron-Zambrano,
Tania Yukary Guerrero-Melendez,
Juan Carlos Elizondo-Leal

Affiliations

Alan Diaz-Manriquez: ORCiD; Facultad de Ingeniera y Ciencias, Universidad Autónoma de Tamaulipas, Ciudad Victoria, México
Ana Bertha Rios-Alvarado: Facultad de Ingeniera y Ciencias, Universidad Autónoma de Tamaulipas, Ciudad Victoria, México
Jose Hugo Barron-Zambrano: Facultad de Ingeniera y Ciencias, Universidad Autónoma de Tamaulipas, Ciudad Victoria, México
Tania Yukary Guerrero-Melendez: Facultad de Ingeniera y Ciencias, Universidad Autónoma de Tamaulipas, Ciudad Victoria, México
Juan Carlos Elizondo-Leal: Facultad de Ingeniera y Ciencias, Universidad Autónoma de Tamaulipas, Ciudad Victoria, México

DOI: https://doi.org/10.1109/ACCESS.2018.2815992
Journal volume & issue: Vol. 6
pp. 21552 – 21559

Abstract

Read online

The use of the Web has increased the creation of digital information in an accelerated way and about multiple subjects. Text classification is widely used to filter emails, classify Web pages, and organize results retrieved by Web browsers. In this paper, we propose to raise the problem of automatic classification of scientific texts as an optimization problem, which will allow obtaining groups from a data set. The use of evolutionary algorithms to solve classification problems has been a recurrent approach. However, there are a few approaches in which classification problems are solved, where the data attributes to be classified are text-type. In this way, it is proposed to use the association for computing machinery taxonomy to obtain the similarity between documents, where each document consists of a set of keywords. According to the results obtained, the algorithm is competitive, which indicates that the proposal of a knowledge-based genetic algorithm is a viable approach to solve the classification problem.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords