IEEE Access (Jan 2020)

Named-Entity Recognition Using Automatic Construction of Training Data From Social Media Messaging Apps

  • Seungwook Lee,
  • Youngjoong Ko

DOI
https://doi.org/10.1109/ACCESS.2020.3043261
Journal volume & issue
Vol. 8
pp. 222724 – 222732

Abstract

Read online

In recent years, social media messaging app data has served as a precious resource to extract useful information, such as critical clues and evidence in legal trials and criminal investigations. Although these data can be of various types, they are mostly in the form of natural language text. Therefore, to extract information from them efficiently, it is essential to research practical natural language processing approaches. This study proposes applying a deep-learning-based named-entity recognition (NER) system as a natural language processing approach for information extraction to these messaging data. In addition, a system for automatically constructing NER training data is presented using the distant supervision method for the training data of deep-learning models. Because social media messaging app data generally include a significant amount of noise, such as typographical and word-spacing errors, a NER system with robustness against these types of noisy data is required to extract information from the messaging data effectively. The results demonstrate that the proposed approach outperforms that of a NER system with manually labeled training data.

Keywords