Extragerea unui sentiment uman dintr-un text folosind o rețea neuronală recurentă și biblioteca Keras

Paul TEODORESCU

doi:10.33436/v30i3y202009

Revista Română de Informatică și Automatică (Sep 2020)

Extragerea unui sentiment uman dintr-un text folosind o rețea neuronală recurentă și biblioteca Keras

Paul TEODORESCU

Affiliations

Paul TEODORESCU: Institutul Naţional de Cercetare-Dezvoltare în Informatică – ICI București

DOI: https://doi.org/10.33436/v30i3y202009
Journal volume & issue: Vol. 30, no. 3
pp. 119 – 132

Abstract

Read online

: In this paper, it is proposed to understand how the computer is able to extract a simple human feeling of "liked" or "disliked" from a text. Basically the computer will learn to correctly place a movie review in one of the two categories of positive or negative. We’ll see how, starting with input values and output values called labels, the computer begins to learn and correctly recognize the output value (in this case the 0 or 1 digit, zero representing a negative feeling and the one a positive feeling) through a model built on the technique called supervised learning. So the proposed objective is to guess the human feeling (translated by the number 0 or number 1) which is in fact the output value of the model, at a new value of the input, once this model has been known. In this exercise we will use Keras API built on TensorFlow, a set of movie reviews taken from IMDB and a recurring neural network RNN with LSTM (Long-Short Term Memory) cells to preserve the memory of the words that were previously encountered. Keras comes with a set of 50,000 movie reviews that were already pre-processed (this will be explained below). By feeding the neural network with these tens of thousands of texts (25,000 texts for training followed by another 25,000 texts for test), the model built by Keras (using relationships of the words), manages to guess with a good accuracy, the positive or negative human feeling, in other words the polarity of the text. The applications for sentiment analysis are endless starting from social media monitoring and VOC, tweets and facebook posts analyzes, to the business analysis by text analysis.

Published in Revista Română de Informatică și Automatică

ISSN: 1220-1758 (Print); 1841-4303 (Online)
Publisher: ICI Publishing House
Country of publisher: Romania
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Automation; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://rria.ici.ro/?lang=en

About the journal

Abstract

Keywords