Preview

Digital Solutions and Artificial Intelligence Technologies

Advanced search

Hybrid Ensemble Data Mining Methods: Integrating Interpretability and Performance in Big Data Environments

https://doi.org/10.26794/3030-7097-2026-2-1-6-15

Abstract

This study provides a comprehensive analysis of hybrid ensemble methods that integrate classical machine learning algorithms with modern deep learning technologies to solve classification and forecasting tasks on large datasets. The main goal of this work is to develop and empirically validate a methodological approach that allows for achieving an optimal balance between model performance and the explainability of its decisions. The study used stacking, bagging, and boosting methods in combination with interpretable machine learning techniques, including SHAP analysis and feature importance methods. The results of the empirical study demonstrate that the proposed hybrid architecture improves classification accuracy by 12–18% compared to the baseline models, while maintaining an interpretability level above 0.85 using the LIME metric. It has been established that the optimal ensemble configuration includes a combination of random forest, gradient boosting, and neural networks with weight coefficients of 0.4, 0.35, and 0.25, respectively. The theoretical significance of the work lies in expanding the methodological framework of data mining by integrating the principles of explainable AI into ensemble architectures. The practical value is determined by the possibility of applying the developed approach in critical areas that require transparent decision-making.

About the Author

S. V. Markova
Financial University under the Government of the Russian Federation
Russian Federation

Svetlana V. Markova — Cand. Sci. (Tech.), Assoc. Prof., Assoc. Prof., Department of Mathematics and Data Analysis, Faculty of Information Technology and Big Data Analysis

Moscow



References

1. Zhou X., Du H., Xue S., Ma Z. Recent advances in data mining and machine learning for enhanced building energy management. Energy. 2024;307:132636. DOI: 10.1016/j.energy.2024.132636

2. Sarker I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science. 2021;2:160. DOI: 10.1007/s42979-021-00592-x

3. Khemani B., Patil S., Kotecha K., Tanwar S. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. Journal of Big Data. 2024;11:18. DOI: 10.1186/s40537-024-00888-z

4. Rahman A., Debnath T., Kundu D., Fahad Bin Mazhar M., Band S.S., Mosavi A. Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities. AIMS Public Health. 2024;11(1):58-109. DOI: 10.3934/publichealth.2024004

5. Talukder Md.A., Islam Md.M., Uddin Md.A., Hasan K.F., Sharmin S., Alyami S.A., Moni M.A. Machine learningbased network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction. Journal of Big Data. 2024;11:5. DOI: 10.1186/s40537-024-00886-1

6. Wang H., Liang Q., Hancock J.T., Khoshgoftaar T.M. Feature selection strategies: a comparative analysis of SHAP-value and importance-based methods. Journal of Big Data. 2024;11:45. DOI: 10.1186/s40537-024-00914-0

7. Ziyadullaev D., Muhamediyeva D., Khujamkulova K., Abdurakhimov D., Maksumkhanova A., Ziyodullaeva G. Ensemble data mining methods for assessing soil fertility. E3S Web of Conferences. 2024;508:02013. DOI: 10.1051/e3sconf/202450802013

8. Demilie W.B. Plant disease detection and classification techniques: a comparative study of the performances. Journal of Big Data. 2024;11:28. DOI: 10.1186/s40537-024-00907-z

9. Stenhouse K., Quirk S., Cherpak L., Giaddui T., Yu Y., Teo B.K. Prospective validation of a machine learning model for applicator and hybrid interstitial needle selection in high-dose-rate cervical brachytherapy. Brachytherapy. 2024;23:(2):145-153. DOI: 10.1016/j.brachy.2023.11.008

10. Azevedo R.C., Araújo R.A., Oliveira A.L.I. Hybrid approaches to optimization and machine learning methods: a systematic literature review. Machine Learning. 2024;113:4055-4097. DOI: 10.1007/s10994-023-06467-x


Review

For citations:


Markova S.V. Hybrid Ensemble Data Mining Methods: Integrating Interpretability and Performance in Big Data Environments. Digital Solutions and Artificial Intelligence Technologies. 2026;2(1):6-15. (In Russ.) https://doi.org/10.26794/3030-7097-2026-2-1-6-15

Views: 115

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 3033-7097 (Online)