Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models

Elmurod Kuriyozov; Sanatbek Matlatipov

doi:10.3390/proceedings2019021037

Proceedings (Aug 2019)

Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models

Elmurod Kuriyozov,
Sanatbek Matlatipov

Affiliations

Elmurod Kuriyozov: CITIC, Grupo LYS, Departamento de Computación. Facultade de Informática, Campus de Elviña, Universidade da Coruña, 15071 A Coruña, Spain
Sanatbek Matlatipov: Applied Mathematics and Computer Analysis Department, National University of Uzbekistan, University Str. 4, Tashkent 100174, Uzbekistan

DOI: https://doi.org/10.3390/proceedings2019021037
Journal volume & issue: Vol. 21, no. 1
p. 37

Abstract

Read online

Making natural language processing technologies available for low-resource languages is an important goal to improve the access to technology in their communities of speakers. In this paper, we provide the first annotated corpora for polarity classification for Uzbek language. Our methodology considers collecting a medium-size manually annotated dataset and a larger-size dataset automatically translated from existing resources. Then, we use these datasets to train sentiment analysis models on the Uzbek language, using both traditional machine learning techniques and recent deep learning models.

Published in Proceedings

ISSN: 2504-3900 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: General Works
Website: http://www.mdpi.com/journal/proceedings

About the journal

Abstract

Keywords