Statistical language model-based analysis of the English-Chinese corpus and political discourse

Sun Xueyu; Zhang Songsong

doi:10.2478/amns.2023.2.00387

Applied Mathematics and Nonlinear Sciences (Jan 2024)

Statistical language model-based analysis of the English-Chinese corpus and political discourse

Sun Xueyu,
Zhang Songsong

Affiliations

Sun Xueyu: School of Foreign Languages, Jiangsu Open University, Nanjing, Jiangsu, 210036, China
Zhang Songsong: School of Foreign Languages, Jinling Institute of Technology, Jiangu, 211169, China

DOI: https://doi.org/10.2478/amns.2023.2.00387
Journal volume & issue: Vol. 9, no. 1

Abstract

Read online

Politics and political discourse are closely related to people’s daily life, and this study aims to propose a new approach to political discourse analysis by combining English and Chinese corpora. By exploring the composition of formal language and the grammar generation process, this paper proposes an improved N-gram algorithm to address the shortcomings of the N-gram model in dealing with low-frequency words with low accuracy and uses the strategy of introducing alternative words to alleviate the problem of sparse data. Then, a critical metaphor analysis of political discourse in the English-Chinese corpus is conducted based on the improved statistical language model, and the convergence of political discourse is studied in terms of space and time. By analyzing the political discourse of American presidents, the spatial centrality factors of “we” and “our nation” were accurately extracted, and their correlations were 0.83, 0.73, 0.68, 0.51, 0.76, and 0.41 in order. The correlations of the unqualified facsimile noun phrases in the temporal convergence of political discourse reached 0.28, 0.25, 0.72, 0.68, and 0.54, respectively, and the accuracy of the improved N-gram model improved by about 28.1% compared with the traditional method, making using statistical linguistic models for political discourse analysis feasible and applicable.

Published in Applied Mathematics and Nonlinear Sciences

ISSN: 2444-8656 (Online)
Publisher: Sciendo
Country of publisher: Poland
LCC subjects: Science: Mathematics
Website: https://sciendo.com/journal/AMNS

About the journal

Abstract

Keywords