Volume 2 Issue 4 | 2025 | View PDF
Paper Id:IJMSM-V2I4P110
doi: 10.71141/30485037/V2I4P110
Improvements of Performance of Diabetes Prediction Using Combined Machine Learning Models and Explainable AI Techniques
Pham Tuan Khanh, Vu Xuan Manh, Hoang Thanh Son, Vu Gia Tue, Nguyen Hoang Phong, Nguyen Minh Huy, Tran Ngoc Phuong Linh
Citation:
Pham Tuan Khanh, Vu Xuan Manh, Hoang Thanh Son, Vu Gia Tue, Nguyen Hoang Phong, Nguyen Minh Huy, Tran Ngoc Phuong Linh, "Improvements of Performance of Diabetes Prediction Using Combined Machine Learning Models and Explainable AI Techniques" International Journal of Multidisciplinary on Science and Management, Vol. 2, No. 4, pp. 84-88, 2025.
Abstract:
Diabetes is one of the most dangerous and non-communicable diseases affecting 537 million people worldwide. It affects the pancreas making the body unable to produce insulin (a substance needed to maintain blood glucose level). One of its symptoms is increased urinal level. Early and accurate prediction of diabetes mellitus can help reduce its effect but predicting it is still a challenge for medical doctors. To address this, we apply an AI model using the Pima Indian dataset and data from female patients in Bangladesh to help professionals on gaining preliminary knowledge about the disease on their patients. We also used a semi-supervised technique to fill in missing insulin information of the private dataset. SMOTE and ADASYN were employed to solve the class imbalance problem. The combination of Machine learning classifiers and ensemble techniques were applied to compare the performance of different algorithms to determine which gives the best result. After training, the proposed method obtained an accuracy of 83%. The paper has shown the promising applications in healthcare.
Keywords:
Diabetes prediction, Healthcare, Machine learning.
References:
1. A. Rghioui, J. Lloret, S. Sendra, and A. Oumnad, "A Smart Architecture for Diabetic Patient Monitoring Using Machine Learning Algorithms," Healthcare (Switzerland), vol. 8, no. 3, 2020.
2. M.K. Hasan, M.A. Alam, D. Das, E. Hossain, and M. Hasan, "Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers," IEEE Access, vol. 8, pp. 76516–76531, 2020.
3. "Diabetes Data Set," Kaggle. Online: https://www.kaggle.com/datasets/mathchi/diabetes-data-set.
4. S. Raschka and V. Mirjalili, Python Machine Learning (4th ed.), Packt Publishing, 2022.
5. J. Bergstra and Y. Bengio, "Random Search for Hyper-Parameter Optimization," Journal of Machine Learning Research, vol. 13, pp. 281–305, 2012.
6. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees, Wadsworth, 1984.
7. T.M. Cover and P.E. Hart, "Nearest Neighbor Pattern Classification," IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
8. L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
9. C. Cortes and V. Vapnik, "Support-Vector Networks," Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
10. D.W. Hosmer, S. Lemeshow, and R.X. Sturdivant, Applied Logistic Regression (3rd ed.), Wiley, 2013.
11. Y. Freund and R.E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997.
12. T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016.
13. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning (2nd ed.), Springer, 2009.
14. L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.
15. M.T. Ribeiro, S. Singh, and C. Guestrin, "‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier," Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144, 2016.
16. I. Tasin, T.U. Nabil, S. Islam, and R. Khan, "Diabetes Prediction Using Machine Learning and Explainable AI Techniques," Healthcare Technology Letters, 2022.