Scientific Reports (Nov 2024)

A method for named entity recognition in social media texts with syntactically enhanced multiscale feature fusion

  • Yuhan Li,
  • Yang Zhou,
  • Xiaofei Hu,
  • Qingxiang Li,
  • Jiali Tian

DOI
https://doi.org/10.1038/s41598-024-78948-5
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Social media data are characterized by significant noise and non-standardization, thereby posing challenges for existing methods in recognizing named entities owing to the entity sparsity and insufficient semantic richness. Thus, to deal with these issues, this study proposes SEMFF-NER, a named entity recognition (NER) method in social media texts that integrates multi-scale features and syntactic information. First, global features are extracted using a Transformer-based encoder (XLNET) with embedded dependency syntactic relations to enhance semantic representation. Next, sliding windows of different lengths capture local features, which are input into a bi-directional long short-term memory (BiLSTM) to capture multi-level local features. Subsequently, the fusion-attention mechanism effectively integrates global contextual information with multiple local features to predict the optimal entity labels. Extensive experiments conducted on three datasets collected from English social media platforms (WNUT2016, WNUT2017, OntoNotes5.0_English) demonstrate the advantageous performance of our proposed method, and ablation experiments further confirm the method’s viability and effectiveness.

Keywords