Improving Low-resource Dependency Parsing Using Multi-strategy Data Augmentation

XIAN Yan-tuan, GAO Fan-ya, XIANG Yan, YU Zheng-tao, WANG Jian

doi:10.11896/jsjkx.210900036

Jisuanji kexue (Jan 2022)

Improving Low-resource Dependency Parsing Using Multi-strategy Data Augmentation

XIAN Yan-tuan, GAO Fan-ya, XIANG Yan, YU Zheng-tao, WANG Jian

Affiliations

XIAN Yan-tuan, GAO Fan-ya, XIANG Yan, YU Zheng-tao, WANG Jian: Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China<br/>Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China

DOI: https://doi.org/10.11896/jsjkx.210900036
Journal volume & issue: Vol. 49, no. 1
pp. 73 – 79

Abstract

Read online

Dependency parsing aims to identify syntactic dependencies between words in a sentence.Dependency parsing can provide syntactic features and improve model performance for tasks such as information extraction,automatic question answering and machine translation.The training data size has an significant impact on the performance of the dependency parsing model.The lack of training data will cause serious unknown word problems and model over-fitting problems.This paper proposes various data augment strategies for the problem of low-resource dependency parsing.The proposed method effectively expands the training data by synonym substitution and alleviates the unknown words problem.The data augment strategies of multiple Mixups effectively alleviate the model overfitting problem and improve the generalization ability of the model.Experimental results on the universal dependencies treebanks(UD treebanks) dataset show that the proposed methods effectively improve the performance of Thai,Vietnamese and English dependency parsing under small-scale training corpus conditions.

dependency parsing|low-resource language|mixup data augmentation|synonym substitution|multi-strategy

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords