Incorporating Part of Speech and Tonal Features for Vietnamese Grammatical Error Detection

ZHANG Zhou, ZHU Jun-guo, YU Zheng-tao

doi:10.11896/jsjkx.210900247

Jisuanji kexue (Nov 2022)

Incorporating Part of Speech and Tonal Features for Vietnamese Grammatical Error Detection

ZHANG Zhou, ZHU Jun-guo, YU Zheng-tao

Affiliations

ZHANG Zhou, ZHU Jun-guo, YU Zheng-tao: School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China ;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China

DOI: https://doi.org/10.11896/jsjkx.210900247
Journal volume & issue: Vol. 49, no. 11
pp. 221 – 227

Abstract

Read online

The BERT pre-trained language model removes the tones of the syllables when segmenting Vietnamese words,which leads to the loss of some semantic information during the training process of grammatical error detection model.To address this problem,an approach combining part of speech and tonal features is proposed to complete the semantic information of the input syllables.Grammatical error detection task confronts the problem of insufficient training data due to the scarcity of labeled Vietnamese data.To address this problem,a data augmentation algorithm is designed to generate a large number of error texts from the correct corpus.Experimental results on Vietnamese Wikipedia and news corpus show that the proposed method achieves the highest F0.5 and F1 score on the test set,which proves it improves the detection performance.Both the proposed method and the baseline model method have a gradual improvement with the increasing scales of the generated data,which proves that the proposed data augmentation algorithm is effective.

pre-trained language model|vietnamese grammatical error detection|feature fusion|data augmentation

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords