Preview

Digital Solutions and Artificial Intelligence Technologies

Advanced search

Modern methods of Document processing for Calculating Stock Market indicators

https://doi.org/10.26794/3033-7097-2025-1-4-6-15

Abstract

This article discusses modern methods of extrapolating pre-trained transformers aimed at improving their ability to process long and short text sequences in Russian in the financial sector. Particular attention is paid to the task of classifying texts that reflect broker analysts’ expectations regarding market movements (expectations of growth, decline, or uncertainty of change). To solve this problem, the application of lightweight language models ruBERTtiny1 and ruBERT-tiny2 is investigated, which are adapted to work effectively with large amounts of input data while maintaining prediction quality. The paper analyzes various approaches to expanding the contextual window of models, including extrapolation methods, and considers the impact of tokenization, vectorization, and embedding strategies on the final classification results. Additionally, the paper discusses the peculiarities of using transformers in conditions of increased market volatility and changing news flows, which allows for a more in-depth assessment of the stability of the proposed solutions. Furthermore, a formula for calculating a leading indicator for stock markets is proposed and discussed, demonstrating the practical significance of using transformer models in the analysis of financial texts and the formation of analytical metrics. The presented results highlight the promising application of compact transformers in predictive financial analytics tasks.

About the Authors

E. F. Boltachev
Financial University under the Government of the Russian Federation
Россия

Eldar F. Boltachev — Cand. Sci. (Tech.), Assoc. Prof. of Artificial Intelligence Department of the Faculty of Information Technology and Big Data Analysis

Moscow



A. I. Tyulyakov
Financial University under the Government of the Russian Federation
Россия

Alexander I. Tyulyakov — Master Programme Student of Artificial Intellegence Department of the Faculty of Information Technologies and Big Data Analysis

Moscow



References

1. Lipatova S.V., Bochkareva Yu.E. Using NLP for the development of electronic teaching and methodological materials. Alley of Science. 2023;4(79):926-931. URL: https://www.elibrary.ru/item.asp?id=54082726

2. Pankratova M.D., Skovpel T.N. NLP models using neural networks in news sentiment analysis. Analytical technologies in the social sphere: Theory and Practice. 2023;(15):97-107. URL: https://www.elibrary.ru/ctabku

3. Ryskin K.E., Vechkanova Y.S., Fedosin S.A. Processing of product items from distributors’ reports using NLP. Proceedings of the XXV Scientific and Practical Conference of Young Scientists, Postgraduate Students and Students of the National Research Mordovian State University. Saransk: National Research Mordovian State University named after N.P. Ogarev, 2022;271-276; URL: https://elibrary.ru/item.asp?id=54051425

4. Dubrovsky V.V., Karmanova E.V. Project for the Development of an Intelligent Online Service for Abstracting Text Documents Using NLP. Project Management. Proceedings of the II All-Russian Scientific Conference, Magnitogorsk, December 01–03, 2023. Magnitogorsk: Magnitogorsk State Technical University named after G.I. Nosov; 2024:37-45. URL: https://elibrary.ru/item.asp?id=60647866

5. Sennrich R., Haddow B., Birch A. Neural Machine Translation of Rare Words with Subword Units. Proceedings of ACL. 2016;1715-1725. DOI: 10.48550/arXiv.1508.07909

6. Song X., Salcianu A., Song Y., Dopson D., Zhou D. Fast WordPiece Tokenization. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021;2089-2103. URL: https://aclanthology.org/2021.emnlp-main.160/

7. Vemula S.R., Sharma D.M., Krishnamurthy P. Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment. 2025; URL: https://arxiv.org/abs/2508.08424

8. Condevaux C., Harispe S. LSG Attention: Extrapolation of Pretrained Transformers to Long Sequences. In: Kashima, H., Ide, T., Peng, WC., eds. Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science. 2023;13935:443–454. DOI: 10.1007/978-3-031-33374-3_35

9. Markov A.K., Semenochkin D.O., Kravets A.G., Yanovsky T.A. Comparative analysis of applied natural language processing technologies to improve the quality of digital document classification. International Journal of Information Technologies. 2024;12(3):66-77. URL: https://www.elibrary.ru/tubosi


Review

For citations:


Boltachev E.F., Tyulyakov A.I. Modern methods of Document processing for Calculating Stock Market indicators. Digital Solutions and Artificial Intelligence Technologies. 2025;1(4):6-15. (In Russ.) https://doi.org/10.26794/3033-7097-2025-1-4-6-15

Views: 51

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 3033-7097 (Online)