Humanities & Social Sciences Communications (Jan 2024)
Dissecting The Analects: an NLP-based exploration of semantic similarities and differences across English translations
Abstract
Abstract The Analects, a classic Chinese masterpiece compiled during China’s Warring States Period, encapsulates the teachings and actions of Confucius and his disciples. The profound ideas it presents retain considerable relevance and continue to exert substantial influence in modern society. The availability of over 110 English translations reflects the significant demand among English-speaking readers. Grasping the unique characteristics of each translation is pivotal for guiding future translators and assisting readers in making informed selections. This research builds a corpus from translated texts of The Analects and quantifies semantic similarity at the sentence level, employing natural language processing algorithms such as Word2Vec, GloVe, and BERT. The findings highlight semantic variations among the five translations, subsequently categorizing them into “Abnormal,” “High-similarity,” and “Low-similarity” sentence pairs. This facilitates a quantitative discourse on the similarities and disparities present among the translations. Through detailed analysis, this study determined that factors such as core conceptual words, and personal names in the translated text significantly impact semantic representation. This research aims to enrich readers’ holistic understanding of The Analects by providing valuable insights. Additionally, this research offers pragmatic recommendations and strategies to future translators embarking on this seminal work.