Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models

Eunchan Lee; Changhyeon Lee; Sangtae Ahn

doi:10.3390/app12094522

Applied Sciences (Apr 2022)

Comparative Study of Multiclass Text Classification in Research Proposals Using Pretrained Language Models

Eunchan Lee,
Changhyeon Lee,
Sangtae Ahn

Affiliations

Eunchan Lee: School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea
Changhyeon Lee: School of Electronics Engineering, Kyungpook National University, Daegu 41566, Korea
Sangtae Ahn: School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Korea

DOI: https://doi.org/10.3390/app12094522
Journal volume & issue: Vol. 12, no. 9
p. 4522

Abstract

Read online

Recently, transformer-based pretrained language models have demonstrated stellar performance in natural language understanding (NLU) tasks. For example, bidirectional encoder representations from transformers (BERT) have achieved outstanding performance through masked self-supervised pretraining and transformer-based modeling. However, the original BERT may only be effective for English-based NLU tasks, whereas its effectiveness for other languages such as Korean is limited. Thus, the applicability of BERT-based language models pretrained in languages other than English to NLU tasks based on those languages must be investigated. In this study, we comparatively evaluated seven BERT-based pretrained language models and their expected applicability to Korean NLU tasks. We used the climate technology dataset, which is a Korean-based large text classification dataset, in research proposals involving 45 classes. We found that the BERT-based model pretrained on the most recent Korean corpus performed the best in terms of Korean-based multiclass text classification. This suggests the necessity of optimal pretraining for specific NLU tasks, particularly those in languages other than English.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords