Methods of Data Mining in the Study of Economic Development of Countries
https://doi.org/10.26794/3030-7097-2026-2-1-45-51
Abstract
The paper presents a demonstration of the analysis of a specific data set using machine learning methods to solve the research problem of finding a link between relative economic indicators (development indices) of the world’s countries according to the World Bank data for 2023 using machine learning methods and the use of structural equations.This approach was applied to the analysis of 205 countries using the weighted syndrome method and 180 countries using oneand twofactor confirmatory analysis. The advantage of using machine learning methods is the ability to use data with gaps, unlike regression models, which are the basis of factor analysis, when data gaps are not acceptable. To use the weighted syndromes method, eight main economic relative indicators of the countries’ development were used. The % GDP growth rate for 2023 was chosen as the grouping indicator. The quality of the model was assessed by the ROC AUK indicator. This indicator is 0.92, which indicates that the selected features really make it possible to divide the countries. Scattering diagrams showing a clear division of countries into two groups according to the analyzed indicators also illustrate the quality of recognition. The use of factor analysis made it possible to build two models (one-factor and two-factor) using not eight indicators, but only four, so that the models measured by the indices of comparative conformity and Tucker-Lewis were statistically significant. However, the loads of the one-factor model (λ) are extremely low (0.258, 0.131), which indicates a weak relationship between the observed variables and the latent factor. The factor practically does not explain the variance of the variables. These results indicate a poor correspondence of the model to the data, especially for the two—factor model, since the value of the TLI index is 0.474, which also cannot be a satisfactory result with an adequate threshold of >0.90–0.95). A meaningful economic interpretation of the latent factors obtained is not provided due to the poor correspondence of the results of factor analysis and the inability to compare machine learning and factor analysis methods with simultaneous analysis, since there was less data suitable for factor analysis than when using machine learning methods that demonstrated their adequacy.
About the Author
L. R. BorisovaRussian Federation
Lyudmila R. Borisova — Cand. Sci. (Phys. And Math.) Assoc. Prof., Department of Mathematics and Data Analysis, Faculty of Information Technology and Big Data Analysis
Moscow
References
1. Kuznetsova A.V., Borisova L.R., Kremer N.S., Friedman M.N. Comparative analysis of subsized regions of the Russian Federation using machine learning methods for a wide range of indicators of fixed assets. Biznes. Obrazovanie. Pravo = Business. Education. Law. 2025;1(70):20-28. (In Russ.). DOI: 10.25683/VOLBI.2025.70.1182
2. Visbal-Cadavid D., Delahoz-Dominquez E., Mendoza-Mendoza A. A multiple factor analysis and hierarchical clustering of global logistics governance and development. Decision Analytics Journal. 2025;(15):100579. DOI: 10.1016/j.dajour.2025.100579
3. Sheng X., Cepni O., Gupta R., Markovski M. Mixed Frequency Machine Learning Forecasting of the Growth of Real Gross Fixed Capital Formation in the United States: The Role of Extreme Weather Conditions 2025: Working Papers University of Pretoria. 2025;202520. URL: https://EconPapers.repec.org/RePEc:pre:wpaper:202520
4. Langa E.S., Giannetti B.F., Sevegnani f., Agostinho F., Almeida C. From theory to application: measuring development disparities in Mozambique through an Odum-inspired emergy framework. Ecological Modeling. 2025;(510):111287. DOI: 10.1016/j.ecolmodel.2025.111287
5. Yahyaoui M.El., Amine S. Mathematical modeling of unemployment dynamics with skills development and cyclical effects. Partial differential equations in applied mathematics. 2024; (911):100800. DOI: 10.1016/j.padiff.2024.100800
6. Baklouti N., Boujelbene Y. A simultaneous equation model of economic growth and shadow economy: Is there a difference between the developed and developing countries? Economic Change and Restructuring, Springer. 2020;53(1):151-170. DOI: 10.1007/s10644-018-9235-8
7. Bihun R., Lytvyn V., Oleksiv N. Mathematical modeling and analysis of the development of territorial communities. Technology Audit and Production Reserves. 2021;3(2(59)):6-12. DOI: 10.15587/2706-5448.2021.232788
8. Stepanov V.S., Bobkov V.N., Shamaeva E.F., Odintsova E.V. Building a model linking the indicator of the standard of living of the population with a set of indicators of socio-economic policy in the regions of Russia. Living Standard of the Population in the Regions in Russia. 2022;18(4):450-465. (In Russ.). DOI: 10.19181/lsprr.2022.18.4
9. Stepanov V.S. The forecast of cancer prevalence in the regions and municipalities of Russia based on a multivariate model. Modeling, Optimization and Information Technology. 2023;11(1):1-17. DOI: 10.26102/2310-6018/2023.40.1.022 (In Russ.).
10. Matviyevsky S.S., Borisova L.R. Clusterization of the countries of the Asia-Pacific region according to the values of inclusive economic growth. Bulletin of the University. 2023; (1):112-121. (In Russ.). DOI: 10.26425/1816-4277-2024-1-112-121
11. Borisova L.R. Comparative analysis of the regions of the Russian Federation using machine learning methods for a set of indicators of electronic services and services. Digital Sociology. 2024;7(4):33-43. (In Russ.). DOI: 10.26425/2658-347X-2024-7-4-33-43
12. Senko O.V., Kuznetsova A.V. A recognition method based on collective decision-making using systems of regularities of various types. Pattern Recogn. Image Anal. 2010;20(2):152-162. DOI: 10.1134/S1054661810020069
Review
For citations:
Borisova L.R. Methods of Data Mining in the Study of Economic Development of Countries. Digital Solutions and Artificial Intelligence Technologies. 2026;2(1):45-51. (In Russ.) https://doi.org/10.26794/3030-7097-2026-2-1-45-51
JATS XML
