Journal of Open Innovation: Technology, Market and Complexity (Sep 2024)

Indonesian disaster named entity recognition from multi source information using bidirectional LSTM (BiLSTM)

  • Guruh Fajar Shidik,
  • Filmada Ocky Saputra,
  • Galuh Wilujeng Saraswati,
  • Nurul Anisa Sri Winarsih,
  • Muhammad Syaifur Rohman,
  • Ricardus Anggi Pramunendar,
  • Edi Jaya Kusuma,
  • Danny Oka Ratmana,
  • Valentijn Venus,
  • Pulung Nurtantio Andono,
  • Zainal Arifin Hasibuan

Journal volume & issue
Vol. 10, no. 3
p. 100358

Abstract

Read online

Precise logistic support is essential after a disaster occurs. It must be timely, accurate, targeted, and based on existing needs. However, obtaining sufficient and accurate information related to logistic distribution locations remains a key problem. Therefore, implementing Named Entity Recognition (NER) can address this issue. In recent years, news coverage through Indonesian digital news media and social media accounts has emerged as a promising source for building a disaster data corpus. This study implemented NER to extract and identify named entities from text-based information, particularly from Indonesian digital news media. In addition to using regular entities from the NER standard, this study introduced new entities specialized for disaster-related information, including DISASTER, SCALE, SUPPLIES, CASUALTIES, and OUTSIDE. The new disaster corpus in the Indonesian language for the NER model was obtained with an imbalanced dataset composition. To overcome this problem, random oversampling was applied. This study also utilized the BiLSTM model to recognize each entity in new textual information, evaluating its performance when the proposed Indonesian disaster corpus was used as a training reference in the deep learning model. Several optimization algorithms applied in BiLSTM were evaluated. The results showed improved BiLSTM performance using Adam optimization and a balanced corpus. Performance indicators achieved were 93.4 %, 82.4 %, and 87.5 % for precision, recall, and F1-score, respectively. The BiLSTM network captured long-range dependencies in sequential data provided by NER. Oversampling ensured that the proposed NER model could precisely recognize all entities and reduce biased results. Thus, the BiLSTM method can better identify entities in the textual corpus of Indonesian disaster-related online news.

Keywords