Natural Language Processing and Fiction Text: Basis for Corpus Research

Alexey I. Gorozhanov; Innara A. Guseynova; Darya V. Stepanova

doi:10.22363/2313-2299-2024-15-1-195-210

RUDN Journal of Language Studies, Semiotics and Semantics (Mar 2024)

Natural Language Processing and Fiction Text: Basis for Corpus Research

Alexey I. Gorozhanov,
Innara A. Guseynova,
Darya V. Stepanova

Affiliations

Alexey I. Gorozhanov: ORCiD; Moscow State Linguistic University
Innara A. Guseynova: ORCiD; Moscow State Linguistic University
Darya V. Stepanova: ORCiD; Minsk State Linguistic University

DOI: https://doi.org/10.22363/2313-2299-2024-15-1-195-210
Journal volume & issue: Vol. 15, no. 1
pp. 195 – 210

Abstract

Read online

The study deals with NLP procedures on the material of the fiction texts in German and in English, which are considered as strong cultural texts. The aim of the study is to develop a model of such a technical device to process, analyze and interpret a fiction text, which would reveal the full potential of popular NLP tools within the corpus approach. The general methods used in the study are analysis and synthesis. Special methods are additionally used to solve certain specific issues: descriptive method, modelling and qualitative and quantitative analysis. The scientific novelty lies in the fact that the authors apply the crucial principles of the classical theories of text interpretation according to the latest methods and tools of the applied linguistics. As a practical result, special software has been developed, which is able to process SQL based linguistic corpora, automatically built with spaCy NLP library and Python programming language. This software can be used for a fiction text interpretation, as well as for compiling learning materials in Home Reading. It is assumed that the development of special software for strong cultural texts stimulates the search for scientific solutions and at the same time allows one to understand the essential differences that exist between natural and artificial intelligence.

Published in RUDN Journal of Language Studies, Semiotics and Semantics

ISSN: 2313-2299 (Print); 2411-1236 (Online)
Publisher: Peoples’ Friendship University of Russia (RUDN University)
Country of publisher: Russian Federation
LCC subjects: Language and Literature: Philology. Linguistics: Language. Linguistic theory. Comparative grammar: Semantics
Website: http://journals.rudn.ru/semiotics-semantics

About the journal

Abstract

Keywords