Applied Mathematics and Nonlinear Sciences (Jan 2024)

Statistical language model-based analysis of the English-Chinese corpus and political discourse

  • Sun Xueyu,
  • Zhang Songsong

DOI
https://doi.org/10.2478/amns.2023.2.00387
Journal volume & issue
Vol. 9, no. 1

Abstract

Read online

Politics and political discourse are closely related to people’s daily life, and this study aims to propose a new approach to political discourse analysis by combining English and Chinese corpora. By exploring the composition of formal language and the grammar generation process, this paper proposes an improved N-gram algorithm to address the shortcomings of the N-gram model in dealing with low-frequency words with low accuracy and uses the strategy of introducing alternative words to alleviate the problem of sparse data. Then, a critical metaphor analysis of political discourse in the English-Chinese corpus is conducted based on the improved statistical language model, and the convergence of political discourse is studied in terms of space and time. By analyzing the political discourse of American presidents, the spatial centrality factors of “we” and “our nation” were accurately extracted, and their correlations were 0.83, 0.73, 0.68, 0.51, 0.76, and 0.41 in order. The correlations of the unqualified facsimile noun phrases in the temporal convergence of political discourse reached 0.28, 0.25, 0.72, 0.68, and 0.54, respectively, and the accuracy of the improved N-gram model improved by about 28.1% compared with the traditional method, making using statistical linguistic models for political discourse analysis feasible and applicable.

Keywords