Construction of Machine-Labeled Data for Improving Named Entity Recognition by Transfer Learning

Juae Kim; Youngjoong Ko; Jungyun Seo

doi:10.1109/ACCESS.2020.2981361

IEEE Access (Jan 2020)

Construction of Machine-Labeled Data for Improving Named Entity Recognition by Transfer Learning

Juae Kim,
Youngjoong Ko,
Jungyun Seo

Affiliations

Juae Kim: ORCiD; Department of Computer Engineering, Sogang University, Seoul, South Korea
Youngjoong Ko: ORCiD; Department of Applied Data Science, Sungkyunkwan University, Suwon-si, South Korea
Jungyun Seo: ORCiD; Department of Computer Engineering, Sogang University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2020.2981361
Journal volume & issue: Vol. 8
pp. 59684 – 59693

Abstract

Read online

Deep neural networks (DNNs) require a large amount of manually labeled training data to make significant achievements. However, manual labeling is laborious and costly. In this study, we propose a method for automatically generating training data and effectively using the generated data to reduce the labeling cost. The generated data (called “machine-labeled data”) is generated using a bagging-based bootstrapping approach. However, using the machine-labeled data does not guarantee high performance because of errors in the automatic labeling. In order to reduce the impact of mislabeling, we applied a transfer learning approach. The effect of our proposed method was verified with two versions of DNN-based named entity recognition (NER) models: bidirectional LSTM-CRF and vanilla BERT. We conducted NER tasks in two languages (English and Korean). The proposed method results in average F1 scores of 78.87% (3.9% point improvement) with bidirectional LSTM-CRF and 82.08% (1% point improvement) with BERT on three Korean NER datasets. In English, the performance increased by an average of 0.45% points with the two DNN-based models. The proposed NER systems outperform the baseline systems in both languages without the need for additional manual labeling.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords