Design of a Modified Transformer Architecture Based on Relative Position Coding

Wenfeng Zheng; Gu Gong; Jiawei Tian; Siyu Lu; Ruiyang Wang; Zhengtong Yin; Xiaolu Li; Lirong Yin

doi:10.1007/s44196-023-00345-z

International Journal of Computational Intelligence Systems (Oct 2023)

Design of a Modified Transformer Architecture Based on Relative Position Coding

Wenfeng Zheng,
Gu Gong,
Jiawei Tian,
Siyu Lu,
Ruiyang Wang,
Zhengtong Yin,
Xiaolu Li,
Lirong Yin

Affiliations

Wenfeng Zheng: School of Automation, University of Electronic Science and Technology of China
Gu Gong: School of Automation, University of Electronic Science and Technology of China
Jiawei Tian: School of Automation, University of Electronic Science and Technology of China
Siyu Lu: School of Automation, University of Electronic Science and Technology of China
Ruiyang Wang: School of Automation, University of Electronic Science and Technology of China
Zhengtong Yin: College of Resource and Environment Engineering, Guizhou University
Xiaolu Li: School of Geographic Science, Southwest University
Lirong Yin: Department of Geography and Anthropology, Louisiana State University

DOI: https://doi.org/10.1007/s44196-023-00345-z
Journal volume & issue: Vol. 16, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Natural language processing (NLP) based on deep learning provides a positive performance for generative dialogue system, and the transformer model is a new boost in NLP after the advent of word vectors. In this paper, a Chinese generative dialogue system based on transformer is designed, which only uses a multi-layer transformer decoder to build the system and uses the design of an incomplete mask to realize one-way language generation. That is, questions can perceive context information in both directions, while reply sentences can only output one-way autoregressive. The above system improvements make the one-way generation of dialogue tasks more logical and reasonable, and the performance is better than the traditional dialogue system scheme. In consideration of the long-distance information weakness of absolute position coding, we put forward the improvement of relative position coding in theory, and verify it in subsequent experiments. In the transformer module, the calculation formula of self-attention is modified, and the relative position information is added to replace the absolute position coding of the position embedding layer. The performance of the modified model in BLEU, embedding average, grammatical and semantic coherence is ideal, to enhance long-distance attention.

Published in International Journal of Computational Intelligence Systems

ISSN: 1875-6891 (Print); 1875-6883 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44196

About the journal

Abstract

Keywords