The use of psychophysiological tools for assessing consistency in the markup of the tonality of texts with code-switching based on a novel by Sergey Minaev

Authors

DOI:

https://doi.org/10.33910/2687-0223-2023-5-1-37-45

Keywords:

2B-PLS, code-switching, named entities, sentiment analysis, Sergey Minaev, information extraction, text mining

Abstract

Code-switching is a phenomenon in which two or more languages occur in the same message. Messages containing mixed languages are quite common in social networks, as well as the discourse of IT specialists and bilinguals. Code-switching presents a challenge for sentiment analysis and other natural language processing tasks.

This article explores the possibility of identifying complex cases in the markup of texts with code-switching using a PLS analysis. Sergey Minaev’s 2008 novel The Chicks. A Tale of Unreal Love was chosen for the analysis. 100 sentences containing words written in Latin were selected, both from the author’s speech and from the dialogues of the characters. A dataset was collected and marked up in the CSV format for further model construction.

The parameters for the analysis were the results of an expert assessment of the sentiments of the selected sentences: the number of entities in Latin and the total number of entities in the phrase; consistency in expert assessments of positive, negative and neutral sentiments; features of categories of entities in Latin (Location, Person, Time/Date, Brand, Organization, Model); an insignificant entity.

The 2B-PLS analysis showed that it is possibile to analyze the consistency of sentiment markup of a phrase by experts depending on the knowledge extracted from the sentences— i.e., named entities and other statistics. The consistency of expert assessments is influenced not only by the category of entities, but also by the sentiment of the phrase, as well as the total number of entities and entities in Latin in the phrase.

The results obtained correlate with theoretical studies.

References

Agarwal, P., Sharma, A., Grover, J. et al. (2017) I may talk in English but gaali toh Hindi mein hi denge: A study of English-Hindi code-switching and swearing pattern on social networks. In: 9th International Conference on Communication Systems and Networks (COMSNETS). Bengaluru: [s. n.], pp. 554–557. https://doi.org/10.1109/COMSNETS.2017.7945452 (In English)

Aguilar, G., AlGhamdi, F., Soto, V. et al. (2018) Named entity recognition on code-switched data: Overview of the CALCS 2018 shared task. In: Proceedings of the Third Workshop on computational approaches to linguistic code-switching. Melbourne: [s. n.], pp. 138–147. (In English)

Akishev, T. (2020) The syntax of Kazakh-Russian intrasentential code-switching in reported clauses. Language. Text. Society, vol. 7, no. 1. [Online]. Available at: https://cyberleninka.ru/article/n/the-syntax-of-kazakh-russian-intrasentential-code-switching-in-reported-clauses (accessed 22.12.2022). (In English)

Burdygina, M. (2021) Code-switching (Russian—English) in the discourse of IT-specialists from Moscow. Journal of Siberian Federal University. Humanities & Social Sciences, vol. 14, no. 10, pp. 1581–1591. https://doi.org/10.17516/1997-1370-0841 (In English)

Gaizauskas, R., Wakao, T., Humphreys, K. et al. (1995) University of Sheffield: Description of the LaSIE system as used for MUC-6. In: MUC6’95: Proceedings of the 6th conference on Message understanding. [S. l.]: History Publ., pp. 207–220. https://doi.org/10.3115/1072399.1072418 (In English)

Kovaleva, V. Yu., Pozdnyakov, A. A., Litvinov, Yu. N., Efimov, V. M. (2019) Otsenka sopryazhennosti morfogeneticheskikh molekulyarno-geneticheskikh modulej izmenchivosti serykh polevok Microtus S.L. v gradientnykh usloviyakh sredy [Estimation of the conjugation between morphogenetic and molecular-genetic modules of gray voles Microtus S.L. variability along a climatic gradient conditions]. Ekologicheskaya genetika — Ecological Genetics, vol. 17, no. 2, pp. 21–34. https://doi.org/10.17816/ecogen17221-34 (In Russian)

Krivoshchekov, S. G., Nikolaeva, E. I., Vergunov, E. G., Prihodko, A. Yu. (2022) Multivariate analysis of indicators of inhibitory and autonomic control in orthostasis and emotional situations. Human Physiology, vol. 48, no. 1, pp. 20–29. https://doi.org/10.1134/S0362119721060050 (In English)

Kuleshova, D. (2020) Code-switching between English and Russian with Russian heritage speakers, born and raised in Russian-speaking families in the USA. Equity and Access for Language Learners, vol. 17, no. 1, pp. 44–65. https://doi.org/10.18060/24431 (In English)

Luo, T., Chen, S., Xu, G., Zhou, J. (2013) Sentiment analysis. In: Trust-Based Collective View Prediction. New York: Springer Publ., pp. 53–68. http://doi.org/10.1007/978-1-4614-7202-5_4 (In English)

Nadeau, D., Sekine, S. (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes, vol. 30, no. 1, pp. 3–26. https://doi.org/10.1075/li.30.1.03nad (In English)

Najdenova, K. A., Nevzorova, O. A. (2008) Mashinnoe obuchenie v zadachakh obrabotki estestvennogo yazyka: obzor sovremennogo sostoyaniya issledovanij [Machine learning for natural language processing: Contemporary State]. Uchenye zapiski Kazanskogo universiteta. Seriya: Fiziko-matematicheskie nauki — Proceedings of Kazan University. Physics and Mathematics Series, vol. 150, no. 4, pp. 5–24. (In Russian)

Naiditch, L. (2000) Code-switching and -mixing in Russian-Hebrew bilinguals. Studies in Slavic and General Linguistics, vol. 28, pp. 277–282. (In English)

Nikolaeva, E. I., Efimova, V. L., Vergunov, E. G. (2022) Integration of vestibular and auditory information in ontogenesis. Children, vol. 9, no. 3, article 401. https://doi.org/10.3390/children9030401 (In English)

Padmaja, S., Fatima, S., Bandu, S. et al. (2020) Sentiment extraction from bilingual code mixed social media text. In: K. Raju, R. Senkerik, S. Lanka, V. Rajagopal (eds.). Data Engineering and Communication Technology. Proceedings of 3rd ICDECT-2K19. Advances in Intelligent Systems and Computing. Vol. 1079. Singapore: Springer Publ., pp. 707–714. https://doi.org/10.1007/978-981-15-1097-7_59 (In English)

Patra, B. G., Das, D., Das, A. (2018) Sentiment analysis of code-mixed Indian languages: An overview of SAIL_code-mixed shared task @ICON-2017. [Online]. Available at: https://arxiv.org/pdf/1803.06745.pdf (accessed 22.12.2022). (In English)

Polunin, D., Shtaiger, I., Efimov, V. (2019) JACOBI4 software for multivariate analysis of biological data. bioRxiv. [Online]. Available at: https://doi.org/10.1101/803684 (accessed 19.11.2022). (In English)

Rännar, S., Lindgren, F., Geladi, P., Wold, S. (1994) A PLS kernel algorithm for data sets with many variables and fewer objects. Part 1: Theory and algorithm. Journal of Chemometrics, vol. 8, no. 2, pp. 111–125. https://doi.org/10.1002/cem.1180080204 (In English)

Rohlf, F. J., Corti, M. (2000) Use of two-block partial least-squares to study covariation in shape. Systematic Biology, vol. 49, no. 4, pp. 740–753. https://doi.org/10.1080/106351500750049806 (In English)

Savostyanov, A. N., Vergunov, E. G., Saprygin, A. E., Lebedkin, D. A. (2022) Validation of a face image assessment technology to study the dynamics of human functional states in the EEG resting-state paradigm. Vavilov Journal of Genetics and Breeding, vol. 26, no. 8, pp. 765–772. https://doi.org/10.18699/VJGB-22-92 (In English)

Singh, V., Vijay, D., Akhtar, S. S., Shrivastava, M. (2018) Named entity recognition for Hindi-English code-mixed social media text. In: Proceedings of the Seventh Named Entities Workshop. Melbourne: [s. n.], pp. 27–35. (In English)

Vergunov, E. G. (2022) Coping space transformation at different levels of university training during the pandemic and the assessment of its integral indicators. Kompleksnye issledovaniya detstva — Comprehensive Child Studies, vol. 4, no. 2, pp. 115–123. https://doi.org/10.33910/2687-0223-2022-4-2-115-123 (In English)

Published

2023-04-03

Issue

Section

Articles