Comparative Analysis of Deep Learning Models for Part of Speech Tagging in the Malay Language

Bakare Mustaphaa Adebayo; Kalaiarasi Sonai Muthu Anbananthen; Saravanan Muthaiyah; Saravanan Nathan Lurudusamy

doi:10.28991/HIJ-2024-05-02-04

HighTech and Innovation Journal (Jun 2024)

Comparative Analysis of Deep Learning Models for Part of Speech Tagging in the Malay Language

Bakare Mustaphaa Adebayo,
Kalaiarasi Sonai Muthu Anbananthen,
Saravanan Muthaiyah,
Saravanan Nathan Lurudusamy

Affiliations

Bakare Mustaphaa Adebayo: Faculty of Information Science and Technology, Multimedia University, Melaka 75450,
Kalaiarasi Sonai Muthu Anbananthen: Faculty of Information Science and Technology, Multimedia University, Melaka 75450,
Saravanan Muthaiyah: School of Business and Technology, International Medical University, Kuala Lumpur 57000,
Saravanan Nathan Lurudusamy: Division Consulting & Technology Services, Telekom Malaysia, Kuala Lumpur 50672,

DOI: https://doi.org/10.28991/HIJ-2024-05-02-04
Journal volume & issue: Vol. 5, no. 2
pp. 272 – 281

Abstract

Read online

Despite the widespread use of Malay, under-resourced languages like Malay face challenges in Natural Language Processing (NLP), particularly in Part-of-Speech (POS) tagging. The scarcity of annotated corpora poses a primary obstacle to POS tagging in Malay. This study aims to enhance the effectiveness and reliability of POS tagging models explicitly tailored for under-resourced languages within the field of NLP, focusing on Malay. Existing models, which rely on Conditional Random Fields and Hidden Markov Models, exhibit limitations, underscoring the need for more robust approaches. The research conducts a comparative analysis of various deep-learning models with different encoders for POS tagging in Malay sentences. The experimental analysis demonstrates that the Bidirectional Long Short-Term Memory (Bi-LSTM) model, leveraging a pre-trained Bidirectional Encoder Representations from Transformers (BERT) embedding model, achieves exceptional accuracy, precision, recall, and F1 scores in predicting tags. Notably, the BERT + Bi-LSTM model, boasting an accuracy of 98.82%, outperforms other models, showcasing superior performance across all evaluated metrics. Additionally, this combined model effectively handles known and unknown words, yielding highly accurate POS tagging results for Malay sentences. Doi: 10.28991/HIJ-2024-05-02-04 Full Text: PDF

Published in HighTech and Innovation Journal

ISSN: 2723-9535 (Online)
Publisher: Ital Publication
Country of publisher: Italy
LCC subjects: Social Sciences: Industries. Land use. Labor: Management. Industrial management: Technological innovations. Automation
Website: https://hightechjournal.org/index.php/HIJ/index

About the journal

Abstract

Keywords