Identification of authorship based on vectorization of lyrics by Anna Akhmatova and Marina Tsvetaeva with psychophysiological methods




2B-PLS, poems by A. Akhmatova, poems by M. Tsvetaeva, differentiation of authors, vectorization of texts


Purposeful dialog systems can employ an authorship determinant in order to obtain a more accurate answer which addresses a specific user and takes into account his features. This article discusses the possibility of distinguishing the works of two authors based on the results of vectorization of their texts using methods that are successfully used in interdisciplinary research.

2B-PLS (Two-Block Projection to Latent Structure) has demonstrated high efficiency in analyzing the results of interdisciplinary research in neurolinguistics, psychophysiology and other fields.

The study described in this article involved the lyrics by Anna Akhmatova and Marina Tsvetaeva, whose works are nowadays researched by many scholars. The author selected 310 poetic texts: 196 poems by Akhmatova, and 114, by Tsvetaeva. The parameters for the analysis were the results of text vectorization: proportions of verbs in the text, proper names, adjectives, adverbs, unique words in the text, punctuation marks, functional parts of speech and significant parts of speech, average line length, number of lines, variety of punctuation marks.

The 2B-PLS analysis based on the vectorization results for the texts in question showed clear differences between the works of the two poets. The author discussed these findings with scholars of Akhmatova and Tsvetaeva.

The lyrics of Akhmatova (as compared to those of Tsvetaeva) are characterized by more frequent use of verbs, adjectives, adverbs, service parts of speech, significant parts of speech, as well as more lines and more variety of punctuation marks.

The lyrics of Tsvetaeva (as compared to those of Akhmatova) are characterized by longer lines and a more diverse vocabulary, as well as more frequent use of punctuation marks, nouns and proper names.

The results obtained correlate with theoretical studies.



