Hybrid Ensemble Data Mining Methods: Integrating Interpretability and Performance in Big Data Environments
https://doi.org/10.26794/3030-7097-2026-2-1-6-15
Abstract
This study provides a comprehensive analysis of hybrid ensemble methods that integrate classical machine learning algorithms with modern deep learning technologies to solve classification and forecasting tasks on large datasets. The main goal of this work is to develop and empirically validate a methodological approach that allows for achieving an optimal balance between model performance and the explainability of its decisions. The study used stacking, bagging, and boosting methods in combination with interpretable machine learning techniques, including SHAP analysis and feature importance methods. The results of the empirical study demonstrate that the proposed hybrid architecture improves classification accuracy by 12–18% compared to the baseline models, while maintaining an interpretability level above 0.85 using the LIME metric. It has been established that the optimal ensemble configuration includes a combination of random forest, gradient boosting, and neural networks with weight coefficients of 0.4, 0.35, and 0.25, respectively. The theoretical significance of the work lies in expanding the methodological framework of data mining by integrating the principles of explainable AI into ensemble architectures. The practical value is determined by the possibility of applying the developed approach in critical areas that require transparent decision-making.
About the Author
S. V. MarkovaRussian Federation
Svetlana V. Markova — Cand. Sci. (Tech.), Assoc. Prof., Assoc. Prof., Department of Mathematics and Data Analysis, Faculty of Information Technology and Big Data Analysis
Moscow
References
1. Zhou X., Du H., Xue S., Ma Z. Recent advances in data mining and machine learning for enhanced building energy management. Energy. 2024;307:132636. DOI: 10.1016/j.energy.2024.132636
2. Sarker I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science. 2021;2:160. DOI: 10.1007/s42979-021-00592-x
3. Khemani B., Patil S., Kotecha K., Tanwar S. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. Journal of Big Data. 2024;11:18. DOI: 10.1186/s40537-024-00888-z
4. Rahman A., Debnath T., Kundu D., Fahad Bin Mazhar M., Band S.S., Mosavi A. Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities. AIMS Public Health. 2024;11(1):58-109. DOI: 10.3934/publichealth.2024004
5. Talukder Md.A., Islam Md.M., Uddin Md.A., Hasan K.F., Sharmin S., Alyami S.A., Moni M.A. Machine learningbased network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction. Journal of Big Data. 2024;11:5. DOI: 10.1186/s40537-024-00886-1
6. Wang H., Liang Q., Hancock J.T., Khoshgoftaar T.M. Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods. Journal of Big Data. 2024;11:45. DOI: 10.1186/s40537-024-00914-0
7. Ziyadullaev D., Muhamediyeva D., Khujamkulova K., Abdurakhimov D., Maksumkhanova A., Ziyodullaeva G. Ensemble data mining methods for assessing soil fertility. E3S Web of Conferences. 2024;508:02013. DOI: 10.1051/e3sconf/202450802013
8. Demilie W.B. Plant disease detection and classification techniques: a comparative study of the performances. Journal of Big Data. 2024;11:28. DOI: 10.1186/s40537-024-00907-z
9. Stenhouse K., Quirk S., Cherpak L., Giaddui T., Yu Y., Teo B.K. Prospective validation of a machine learning model for applicator and hybrid interstitial needle selection in high-dose-rate cervical brachytherapy. Brachytherapy. 2024;23:(2):145-153. DOI: 10.1016/j.brachy.2023.11.008
10. Azevedo R.C., Araújo R.A., Oliveira A.L.I. Hybrid approaches to optimization and machine learning methods: a systematic literature review. Machine Learning. 2024;113:4055-4097. DOI: 10.1007/s10994-023-06467-x
Review
For citations:
Markova S.V. Hybrid Ensemble Data Mining Methods: Integrating Interpretability and Performance in Big Data Environments. Digital Solutions and Artificial Intelligence Technologies. 2026;2(1):6-15. (In Russ.) https://doi.org/10.26794/3030-7097-2026-2-1-6-15
JATS XML
