Sentiment Analysis of User Texts with Machine Learning Methods
https://doi.org/10.26794/3033-7097-2025-1-4-16-25
Abstract
This paper explores the application of machine learning methods for sentiment analysis of user-generated texts in the Russian social network VKontakte. The sentiments of millions of users could be monitored and analyzed in real time, that facilitates prompt decision making and forecasting of social processes. Textual data, including posts and comments, were collected via the VK API. The preprocessing pipeline involved text cleaning, lemmatization, stop-word removal, and TFIDF vectorization. Several classification models were tested, including logistic regression, random forest, and naïve Bayes, as well as deep learning models such as LSTM and Transformers (RuBERT). The naïve Bayes classifier demonstrated the best performance in terms of recall and overall metric balance. Sentiment analysis results revealed that the majority of user texts were neutral or positive, with only a small portion being negative. The paper includes visualizations and statistical summaries of sentiment distribution. The study confirms the effectiveness of classical machine learning methods for processing and analyzing textual data in Russian social networks.
About the Authors
E. A. GorbunovaРоссия
Ekaterina A. Gorbunova — Senior Software Developer
Saint Petersburg
R. A. Kochkarov
Россия
Rasul A. Kochkarov — Cand. Sci. (Econ.), Assoc. Prof. of Artificial Intelligence Department, Faculty of Information Technology and Big Data Analysis
Moscow
E. A. Okuneva
Россия
Evelina A. Okuneva — Assistant of the Department of Mathematics and Data Analysis, Faculty of Information Technology and Big Data Analysis
Moscow
References
1. Rodríguez-Ibánez M., Casanez-Ventura F., Castejón-Mateos F., Cuenca-Jiménez P.-M. A review on sentiment analysis from social media platforms. Expert Systems with Applications. 2023;223:119862. DOI: 10.1016/j.eswa.2023.119862
2. Wankhade M., Rao A.C.S., & Kulkarni, C. A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review. 2022;55(7):5731–5780. DOI: 10.1007/s10462-022-10144-1
3. Cortis K., Davis B. Over a Decade of Social Opinion Mining: A Systematic Review. Artificial Intelligence Review. 2021;54(6):4873–4965. DOI: 10.1007/s10462-021-10030-2
4. Mutanov G., Karyukin A., Mamykova G. Multi-Class Sentiment Analysis of Social Media Data with Machine Learning Algorithms. Computers, Materials & Continua. 2021;69(1):913–930. DOI: 10.32604/cmc.2021.017827
5. Salman I.K., Feizi Derakhshi M.R., Pashazadeh S., Asadpour M. A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks. ArXiv preprint. 2022;arXiv:2207.02160. DOI: 10.48550/arXiv.2207.02160
6. Zhou, H. Research of text classification based on TF-IDF and CNN-LSTM. Journal of Physics: Conference Series. 2022;2171:012021. DOI: 10.1088/1742-6596/2171/1/012021
7. Oliveira D.F., Nogueira A., Brito M. Performance comparison of machine learning algorithms in classifying information technologies incident tickets. AI. 2022;3(3):601–622. DOI: 10.3390/ai3030035
8. Smetanin, S. The applications of sentiment analysis for Russian language texts: current challenges and future perspectives. IEEE Access. 2020;8:110693–110719. DOI: 10.1109/ACCESS.2020.3002215
9. Braga M., Milanese G.C., Pasi G. Investigating large language models’ linguistic abilities for text preprocessing. arXiv preprint. 2025;arXiv:2510.11482. DOI: 10.48550/arXiv.2510.11482
10. Feng J.H., Mohaghegh M. Hybrid model of data augmentation methods for text classification task. In Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC 3K 2021). 2021:194–197. DOI: 10.5220/0010688500003064
11. Gadasin D.V., Pak E.V., Korovushkina V.M., Melkova E.K. Natural Language Term-Based Text Information Preprocessing. REDS: Telecommunication Devices and Systems. 2022;1:4-12. URL: https://www.elibrary.ru/pdgavp (In Russ.).
12. Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint. 2019; arXiv:1907.11692. DOI: 10.48550/arXiv.1907.11692
13. Shchekotin E.V., Goiko V.L., Basina P.A., Bakulin V.V. Using machine learning to study the population life quality: methodological aspects. Digital Sociology. 2022;5(1):87–97. (In Russ.). DOI: 10.26425/2658-347X-2022-5-1-87-97
14. Galchenko Yu.V., Nesterov S.A. Sentiment analysis with machine learning methods. Systems Analysis in Design and Management. Proc. Of the XXVI International scientific conference, St. Petersburg, October 13–14, 2023. St. Petersburg: Politekh-Press; 2023;3:369–378. (In Russ.). DOI: 10.18720/SPBPU/2/id23-501
15. Mezenev K.A., Badryzlova Yu.G. Sentiment Analysis of Russian-Language Texts Using Digital Methods. Master’s Thesis. National Research University Higher School of Economics (HSE), Moscow. 2025. URL: https://www.hse.ru/edu/vkr/1055012487 (In Russ.).
16. Katermina T.S., Tagirov K.M., Tagirov T.M. Elements of artificial intelligence in solving problems of text analysis. Computational Nanotechnology. 2022;9(2):35-44. (In Russ.). DOI: 10.33693/2313-223X-2022-9-2-35-44
17. Chelyshev E.A., Otsokov Sh.A., Raskatova M.V., Shchegolev P. Comparing classification methods for news texts in russian using machine learning algorithms. Proceedings of Cybernetics. 2022;1(45):63-71. (In Russ.). DOI: 10.34822/1999-7604-2022-1-63-71
18. Ivakhin D.E., Andieva E. Yu. Automatic text analysis for identifying professional skills: a hybrid approach based on TF-IDF and neural network embeddings. Bulletin of Science. 2025;4(85):685–692. (In Russ.). URL: https://www.vestnik-nauki.com/article/22263
Review
For citations:
Gorbunova E.A., Kochkarov R.A., Okuneva E.A. Sentiment Analysis of User Texts with Machine Learning Methods. Digital Solutions and Artificial Intelligence Technologies. 2025;1(4):16-25. (In Russ.) https://doi.org/10.26794/3033-7097-2025-1-4-16-25
JATS XML
