Multilabel Text Classification in News Articles Using Long-Term Memory with Word2Vec

Winda Kurnia Sari; Dian Palupi Rini; Reza Firsandaya Malik; Iman Saladin B. Azhar

doi:10.29207/resti.v4i2.1655

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Apr 2020)

Multilabel Text Classification in News Articles Using Long-Term Memory with Word2Vec

Winda Kurnia Sari,
Dian Palupi Rini,
Reza Firsandaya Malik,
Iman Saladin B. Azhar

Affiliations

Winda Kurnia Sari: Universitas Sriwijaya
Dian Palupi Rini: Universitas Sriwijaya
Reza Firsandaya Malik: Communication Network and Information Security Research Lab
Iman Saladin B. Azhar: Universitas Sriwijaya

DOI: https://doi.org/10.29207/resti.v4i2.1655
Journal volume & issue: Vol. 4, no. 2
pp. 276 – 285

Abstract

Read online

Multilabel text classification is a task of categorizing text into one or more categories. Like other machine learning, multilabel classification performance is limited to the small labeled data and leads to the difficulty of capturing semantic relationships. It requires a multilabel text classification technique that can group four labels from news articles. Deep Learning is a proposed method for solving problems in multilabel text classification techniques. Some of the deep learning methods used for text classification include Convolutional Neural Networks, Autoencoders, Deep Belief Networks, and Recurrent Neural Networks (RNN). RNN is one of the most popular architectures used in natural language processing (NLP) because the recurrent structure is appropriate for processing variable-length text. One of the deep learning methods proposed in this study is RNN with the application of the Long Short-Term Memory (LSTM) architecture. The models are trained based on trial and error experiments using LSTM and 300-dimensional words embedding features with Word2Vec. By tuning the parameters and comparing the eight proposed Long Short-Term Memory (LSTM) models with a large-scale dataset, to show that LSTM with features Word2Vec can achieve good performance in text classification. The results show that text classification using LSTM with Word2Vec obtain the highest accuracy is in the fifth model with 95.38, the average of precision, recall, and F1-score is 95. Also, LSTM with the Word2Vec feature gets graphic results that are close to good-fit on seventh and eighth models.

recurrent neural network, long short-term memory, multi-label classification, glove

Published in Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

ISSN: 2580-0760 (Online)
Publisher: Ikatan Ahli Informatika Indonesia
Country of publisher: Indonesia
LCC subjects: Technology: Engineering (General). Civil engineering (General): Systems engineering; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://jurnal.iaii.or.id/index.php/RESTI

About the journal

Abstract

Keywords