References

dsait

Цифровые решения и технологии искусственного интеллекта

Digital Solutions and Artificial Intelligence Technologies

3033-7097

Финансовый университет при Правительстве Российской Федерации

10.26794/3030-7097-2026-2-1-6-15

dsait-45

Research Article

ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ И МАШИННОЕ ОБУЧЕНИЕ

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Гибридные ансамблевые методы интеллектуального анализа данных: интеграция интерпретируемости и производительности в условиях больших данных

Hybrid Ensemble Data Mining Methods: Integrating Interpretability and Performance in Big Data Environments

https://orcid.org/0000-0002-9145-5494

Маркова

С. В.

Markova

S. V.

Светлана Владимировна Маркова — кандидат технических наук, доцент, доцент кафедры математики и анализа данных факультета информационных технологий и анализа больших данных

Москва

Svetlana V. Markova — Cand. Sci. (Tech.), Assoc. Prof., Assoc. Prof., Department of Mathematics and Data Analysis, Faculty of Information Technology and Big Data Analysis

Moscow

SVmarkova@fa.ru

Финансовый университет при Правительстве Российской ФедерацииFinancial University under the Government of the Russian Federation

2026

22042026

21615

2026

Маркова С.В.

Markova S.V.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://www.digitarin.ru/jour/article/view/45

Данное исследование представляет комплексный анализ гибридных ансамблевых методов, интегрирующих классические алгоритмы машинного обучения с современными технологиями глубокого обучения для решения задач классификации и прогнозирования на больших данных. основная цель работы заключается в разработке и эмпирической валидации методологического подхода, позволяющего достичь оптимального баланса между производительностью модели и объяснимостью ее решений.

В ходе исследования применялись методы стекинга, бэггинга и бустинга в сочетании с техниками интерпретируемого машинного обучения, включая SHAP-анализ и методы важности признаков.

Результаты эмпирической базы исследования демонстрируют, что предложенная гибридная архитектура обеспечивает повышение точности классификации на 12–18% по сравнению с базовыми моделями при сохранении уровня интерпретируемости выше 0,85 по метрике LIME. Установлено, что оптимальная конфигурация ансамбля включает комбинацию случайного леса, градиентного бустинга и нейронных сетей с весовыми коэффициентами 0,4, 0,35 и 0,25 соответственно. теоретическая значимость работы заключается в расширении методологической базы интеллектуального анализа данных через интеграцию принципов объяснимого ИИ в ансамблевые архитектуры. Практическая ценность определяется возможностью применения разработанного подхода в критически важных областях, требующих прозрачности принятия решений.

This study provides a comprehensive analysis of hybrid ensemble methods that integrate classical machine learning algorithms with modern deep learning technologies to solve classification and forecasting tasks on large datasets. The main goal of this work is to develop and empirically validate a methodological approach that allows for achieving an optimal balance between model performance and the explainability of its decisions. The study used stacking, bagging, and boosting methods in combination with interpretable machine learning techniques, including SHAP analysis and feature importance methods. The results of the empirical study demonstrate that the proposed hybrid architecture improves classification accuracy by 12–18% compared to the baseline models, while maintaining an interpretability level above 0.85 using the LIME metric. It has been established that the optimal ensemble configuration includes a combination of random forest, gradient boosting, and neural networks with weight coefficients of 0.4, 0.35, and 0.25, respectively. The theoretical significance of the work lies in expanding the methodological framework of data mining by integrating the principles of explainable AI into ensemble architectures. The practical value is determined by the possibility of applying the developed approach in critical areas that require transparent decision-making.

интеллектуальный анализ данныхансамблевые методыобъяснимый искусственный интеллектгибридные алгоритмымашинное обучениебольшие данныеинтерпретируемость

data miningensemble methodsexplainable artificial intelligencehybrid algorithmsmachine learningbig datainterpretability

References1

Zhou X., Du H., Xue S., Ma Z. Recent advances in data mining and machine learning for enhanced building energy management. Energy. 2024;307:132636. DOI: 10.1016/j.energy.2024.132636

Sarker I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science. 2021;2:160. DOI: 10.1007/s42979-021-00592-x

Khemani B., Patil S., Kotecha K., Tanwar S. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. Journal of Big Data. 2024;11:18. DOI: 10.1186/s40537-024-00888-z

Rahman A., Debnath T., Kundu D., Fahad Bin Mazhar M., Band S.S., Mosavi A. Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities. AIMS Public Health. 2024;11(1):58-109. DOI: 10.3934/publichealth.2024004

Talukder Md.A., Islam Md.M., Uddin Md.A., Hasan K.F., Sharmin S., Alyami S.A., Moni M.A. Machine learningbased network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction. Journal of Big Data. 2024;11:5. DOI: 10.1186/s40537-024-00886-1

Wang H., Liang Q., Hancock J.T., Khoshgoftaar T.M. Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods. Journal of Big Data. 2024;11:45. DOI: 10.1186/s40537-024-00914-0

Ziyadullaev D., Muhamediyeva D., Khujamkulova K., Abdurakhimov D., Maksumkhanova A., Ziyodullaeva G. Ensemble data mining methods for assessing soil fertility. E3S Web of Conferences. 2024;508:02013. DOI: 10.1051/e3sconf/202450802013

Demilie W.B. Plant disease detection and classification techniques: a comparative study of the performances. Journal of Big Data. 2024;11:28. DOI: 10.1186/s40537-024-00907-z

Stenhouse K., Quirk S., Cherpak L., Giaddui T., Yu Y., Teo B.K. Prospective validation of a machine learning model for applicator and hybrid interstitial needle selection in high-dose-rate cervical brachytherapy. Brachytherapy. 2024;23:(2):145-153. DOI: 10.1016/j.brachy.2023.11.008

Azevedo R.C., Araújo R.A., Oliveira A.L.I. Hybrid approaches to optimization and machine learning methods: a systematic literature review. Machine Learning. 2024;113:4055-4097. DOI: 10.1007/s10994-023-06467-x

The authors declare that there are no conflicts of interest present.