Academic expert finding using BERT pre-trained language model

Ilma Alpha Mannix; Evi Yulianti

doi:10.26555/ijain.v10i2.1497

IJAIN (International Journal of Advances in Intelligent Informatics) (May 2024)

Academic expert finding using BERT pre-trained language model

Ilma Alpha Mannix,
Evi Yulianti

Affiliations

Ilma Alpha Mannix: Universitas Indonesia
Evi Yulianti: Universitas Indonesia

DOI: https://doi.org/10.26555/ijain.v10i2.1497
Journal volume & issue: Vol. 10, no. 2
pp. 280 – 295

Abstract

Read online

Academic expert finding has numerous advantages, such as: finding paper-reviewers, research collaboration, enhancing knowledge transfer, etc. Especially, for research collaboration, researchers tend to seek collaborators who share similar backgrounds or with the same native languages. Despite its importance, academic expert findings remain relatively unexplored within the context of Indonesian language. Recent studies have primarily relied on static word embedding techniques such as Word2Vec to match documents with relevant expertise areas. However, Word2Vec is unable to capture the varying meanings of words in different contexts. To address this research gap, this study employs Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art contextual embedding model. This paper aims to examine the effectiveness of BERT on the task of academic expert finding. The proposed model in this research consists of three variations of BERT, namely IndoBERT (Indonesian BERT), mBERT (Multilingual BERT), and SciBERT (Scientific BERT), which will be compared to a static embedding model using Word2Vec. Two approaches were employed to rank experts using the BERT variations: feature-based and fine-tuning. We found that the IndoBERT model outperforms the baseline by 6–9% when utilizing the feature-based approach and shows an improvement of 10–18% with the fine-tuning approach. Our results proved that the fine-tuning approach performs better than the feature-based approach, with an improvement of 1–5%. It concludes by using IndoBERT, this research has shown an improved effectiveness in the academic expert finding within the context of Indonesian language.

Published in IJAIN (International Journal of Advances in Intelligent Informatics)

ISSN: 2442-6571 (Print); 2548-3161 (Online)
Publisher: Universitas Ahmad Dahlan
Country of publisher: Indonesia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://ijain.org/index.php/IJAIN/index

About the journal

Abstract

Keywords