Learning the Morphological and Syntactic Grammars for Named Entity Recognition

Mengtao Sun; Qiang Yang; Hao Wang; Mark Pasquine; Ibrahim A. Hameed

doi:10.3390/info13020049

Information (Jan 2022)

Learning the Morphological and Syntactic Grammars for Named Entity Recognition

Mengtao Sun,
Qiang Yang,
Hao Wang,
Mark Pasquine,
Ibrahim A. Hameed

Affiliations

Mengtao Sun: Department of ICT and Natural Sciences, Norwegian University of Science and Technology, 6009 Ålesund, Norway
Qiang Yang: China Telecom (Middle East) FZ-LLC., Dubai 500482, United Arab Emirates
Hao Wang: Department of Computer Science, Norwegian University of Science and Technology, 2815 Gjøvik, Norway
Mark Pasquine: Department of International Business, Norwegian University of Science and Technology, 6009 Ålesund, Norway
Ibrahim A. Hameed: Department of ICT and Natural Sciences, Norwegian University of Science and Technology, 6009 Ålesund, Norway

DOI: https://doi.org/10.3390/info13020049
Journal volume & issue: Vol. 13, no. 2
p. 49

Abstract

Read online

In some languages, Named Entity Recognition (NER) is severely hindered by complex linguistic structures, such as inflection, that will confuse the data-driven models when perceiving the word’s actual meaning. This work tries to alleviate these problems by introducing a novel neural network based on morphological and syntactic grammars. The experiments were performed in four Nordic languages, which have many grammar rules. The model was named the NorG network (Nor: Nordic Languages, G: Grammar). In addition to learning from the text content, the NorG network also learns from the word writing form, the POS tag, and dependency. The proposed neural network consists of a bidirectional Long Short-Term Memory (Bi-LSTM) layer to capture word-level grammars, while a bidirectional Graph Attention (Bi-GAT) layer is used to capture sentence-level grammars. Experimental results from four languages show that the grammar-assisted network significantly improves the results against baselines. We also investigate how the NorG network works on each grammar component by some exploratory experiments.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords