Deep Learning with Word Embedding Improves Kazakh Named-Entity Recognition

Gulizada Haisa; Gulila Altenbek

doi:10.3390/info13040180

Information (Apr 2022)

Deep Learning with Word Embedding Improves Kazakh Named-Entity Recognition

Gulizada Haisa,
Gulila Altenbek

Affiliations

Gulizada Haisa: College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
Gulila Altenbek: College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China

DOI: https://doi.org/10.3390/info13040180
Journal volume & issue: Vol. 13, no. 4
p. 180

Abstract

Read online

Named-entity recognition (NER) is a preliminary step for several text extraction tasks. In this work, we try to recognize Kazakh named entities by introducing a hybrid neural network model that leverages word semantics with multidimensional features and attention mechanisms. There are two major challenges: First, Kazakh is an agglutinative and morphologically rich language that presents a challenge for NER due to data sparsity. The other is that Kazakh named entities have unclear boundaries, polysemy, and nesting. A common strategy to handle data sparsity is to apply subword segmentation. Thus, we combined the semantics of words and stems by stemming from the Kazakh morphological analysis system. Additionally, we constructed a graph structure of entities, with words, entities, and entity categories as nodes and inclusion relations as edges, and updated nodes using a gated graph neural network (GGNN) with an attention mechanism. Finally, through the conditional random field (CRF), we extracted the final results. Experimental results show that our method consistently outperforms all previous methods by 88.04% in terms of F1 scores.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords