Jurnal Ilmiah SINERGI (Dec 2023)

Enhancing Indonesian customer complaint analysis: LDA topic modelling with BERT embeddings

  • Mutiara Auliya Khadija,
  • Wahyu Nurharjadmo

DOI
https://doi.org/10.22441/sinergi.2024.1.015
Journal volume & issue
Vol. 28, no. 1
pp. 152 – 162

Abstract

Read online

Social media data can be mining for recommended systems to know the best trends or patterns. The customers have the freedom to ask questions about the product, tell their demands, and convey their complaints through social media. By mining social media data, companies can gain valuable insights into customer preferences, opinions, and sentiments. This information can be utilized to improve products and services, tailor marketing strategies, and enhance overall customer satisfaction. Topic modelling is a text mining technique that extracts the content from the raw and unlabelled data. Latent Dirichlet Allocation is popular for topic modelling research cause flexible and adaptive. But that method has issues with sparsity, performs poorly when documented in the short text and there is no correlation between topics that are actually important in text data. BERT is Bidirectional Encoder Representations from Transformer is designed to pre-train deep bidirectional representations from unlabelled text. The result of this research proves that Latent Dirichlet Allocation and BERT can be arranged on the topic of Indonesian customer complaints. BERT-Base Multilingual Cased and LDA have the highest coherence score. The combination of BERT-Base Multilingual Uncased and LDA has the highest silhouette score. BERT Multilingual are potential for improving the LDA method for Indonesian customer complaints topic modelling.

Keywords