Journal of Big Data (Sep 2024)

IPerFEX-2023: Indonesian personal financial entity extraction using indoBERT-BiGRU-CRF model

  • Emmanuel Dave,
  • Andry Chowanda

DOI
https://doi.org/10.1186/s40537-024-00987-6
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 24

Abstract

Read online

Abstract There is minimal research focusing on applications of Indonesian named entity recognition (NER) in a specific domain. This study proposes an Indonesian personal financial entity extraction task that can be utilized in a financial assistant chatbot system to interpret user’s financial situation for personalization. Due to the simplicity that a chatbot has, it can promote financial management practices to youth as early as possible in their career. However, the challenge in financial NER is numerical entity extraction that relies heavily on contextual information and suffers the out-of-vocabulary (OOV) problem. Therefore, to extract 15 personal financial entities in daily Indonesian discussions (expense-type, expense-amount, income-type, income-amount, asset-type, asset-amount, saving-type, saving-amount, liability-type, liability-amount, family, time, financial-goal, age, and occupation), this research proposes a dataset, IPerFEX-2023, trained using the Bidirectional Gated Recurrent Unit BiGRU and Conditional Random Field (CRF) with Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT) pre-trained model for feature embeddings (IndoBERT-BiGRU-CRF). It is compared with the corresponding Bidirectional Long Short-Term Memory (BiLSTM) (IndoBERT-BiLSTM-CRF) as baseline. Not only the IndoBERT-BiGRU-CRF model achieves the best performance with a 0.73 F1-score, but it is also 14% faster on average compared to the corresponding baseline model due to its simpler unit structure. This paper also discusses future directions covering model enhancement strategy based on the error analysis result and complementary tasks needed to complete personalization

Keywords