Adaptivni Sistemi Avtomatičnogo Upravlinnâ (Aug 2020)
Data augmentation with foreign language content in text classification using machine learning
Abstract
The object of research is the data augmentation method in text classification problems using machine learning methods. The method is considered on the example of sentiment analysis of visitor reviews of hotels. It is shown that datasets with insufficient volume or representativeness requires special methods for increasing the amount of data in it. The aim of the work is to improve the accuracy of the neural network in the tasks of text classification by increasing the amount of data. To achieve the goal, it was proposed to use text data written in languages of other families, which will be translated into the target language using Google translator. Russian was chosen as the target language. To level the effect of the model on the results, a simple neural network is used - a multilayer perceptron with variations in the parameters of its structure. The article investigated the influence of the considered data augmentation method on the resulting accuracy of the network. According to experimental results, the expediency of using this method in a number of tasks is shown. Ref. 7, pic. 3, tab. 3
Keywords