Survey on Sequence Data Augmentation

GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao

doi:10.3778/j.issn.1673-9418.2012062

Jisuanji kexue yu tansuo (Jul 2021)

Survey on Sequence Data Augmentation

GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao

Affiliations

GE Yizhou, XU Xiang, YANG Suorong, ZHOU Qing, SHEN Furao: 1. Science and Technology on Communication Information Security Control Laboratory, Jiaxing, Zhejiang 314033, China 2. No.36 Research Institute, China Electronics Technology Group Corporation, Jiaxing, Zhejiang 314033, China 3. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2012062
Journal volume & issue: Vol. 15, no. 7
pp. 1207 – 1219

Abstract

Read online

To pursue higher accuracy, the structure of deep learning model is getting more and more complex, with deeper and deeper network. The increase in the number of parameters means that more data are needed to train the model. However, manually labeling data is costly, and it is not easy to collect data in some specific fields limited by objective reasons. As a result, data insufficiency is a very common problem. Data augmentation is here to alleviate the problem by artificially generating new data. The success of data augmentation in the field of computer vision leads people to consider using similar methods on sequence data. In this paper, not only the time-domain methods such as flipping and cropping but also some augmentation methods in frequency domain are described. In addition to experience-based or knowledge-based methods, detailed descriptions on machine learning models used for automatic data generation such as GAN are also included. Methods that have been widely applied to various sequence data such as text, audio and time series are mentioned with their satisfactory performance in issues like medical diagnosis and emotion classification. Despite the difference in data type, these methods are designed with similar ideas. Using these ideas as a clue, various data augmentation methods applied to different types of sequence data are introduced, and some discussions and prospects are made.

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords