Открытое образование (Москва) (Nov 2023)

Comparison of Deep Learning Sentiment Analysis Methods, Including LSTM and Machine Learning

  • Jean Max T. Habib,
  • A. A. Poguda

DOI
https://doi.org/10.21686/1818-4243-2023-4-60-71
Journal volume & issue
Vol. 27, no. 4
pp. 60 – 71

Abstract

Read online

Purpose of research. The purpose of the study is to evaluate certain machine learning models in data processing based on speed and efficiency related to the analysis of sentiment or consumer opinions in business intelligence. To highlight the existing developments, an overview of modern methods and models of sentiment analysis is given, demonstrating their advantages and disadvantages.Materials and methods. In order to improve the semester analysis process, organized using existing methods and models, it is necessary to adjust it in accordance with the growing changes in information flows today. In this case, it is crucial for researchers to explore the possibilities of updating certain tools, either to combine them or to develop them to adapt them to modern tasks in order to provide a clearer understanding of the results of their treatment. We present a comparison of several deep learning models, including convolutional neural networks, recurrent neural networks, and long-term and shortterm bidirectional memory, evaluated using different approaches to word integration, including Bidirectional Encoder Representations from Transformers (BERT) and its variants, FastText and Word2Vec. Data augmentation was conducted using a simple data augmentation approach. This project uses natural language processing (NLP), deep learning, and models such as LSTM, CNN, SVM TF-IDF, Adaboost, Naive Bayes, and then combinations of models.The results of the study allowed us to obtain and verify model results with user reviews and compare model accuracy to see which model had the highest accuracy results from the models and their combination of CNN with LSTM model, but SVM with TF-IDF vectoring was most effective for this unbalanced data set. In the constructed model, the result was the following indexes: ROC AUC - 0.82, precision - 0.92, F1 - 0.82, Precision - 0.82, and Recall - 0.82. More research and model implementation can be done to find a better model.Conclusion. Natural language text analysis has advanced quite a bit in recent years, and it is possible that such problems will be completely solved in the near future. Several different models in ML and CNN with the LSTM model, but SVM with the TF-IDF vectorizer proved most effective for this unbalanced data set. In general, both deep classification algorithm. A combination of both approaches can also learning and feature-based selection methods can be used to solve be used to further improve the efficiency of the algorithm. some of the most pressing problems. Deep learning is useful when the most relevant features are not known in advance, while feature-based

Keywords