Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Seriâ 2. Âzykoznanie (Dec 2021)
RUSSIAN TEXT AUTHOR’S GENDER IDENTIFICATION IN FORENSIC EXAMINATION: PROBABILITY-AND-STATISTICS METHOD
Abstract
The article discusses intermediate research results in the development and improvement of a computerized model of Russian texts authorization, which is based on complex application of probabilistic-and- statistical methods. The study aims to describe the new capabilities of the created system in the aspect of its application to diagnostic examinations in text authorization for detection of the gender of the alleged author of the text. The work presents the next stage of fine-tuning and testing of the improved version of the computer program “CTA” (computerized text authorization), which at this stage was adapted for the task of determining and comparing stable relative frequencies of correlation coefficients (the ratio of specified linguistic phenomena of different levels of the language system) in the texts, the authors of which are men and women. The research material is the continuously updated primary bases of literary texts of the 19 th and 21 st centuries (4 bases, respectively). The work shows that for the texts written by men and women, significant differences can be noted in such correlation coefficients as average word length, average sentence length, objectivity coefficient, quality coefficient, activity coefficient, dynamism coefficient, connectivity coefficient, etc. Verification of the results obtained experimentally has demonstrated that the accuracy of gender determining at this stage of the study is approximately 65%. This indicator can be significantly exceeded with an increase in the volume and quality specification of databases and/or when using new models for calculating the correlation coefficients (Spearman’s model, etc.).
Keywords