Leveraging Machine Learning and SMOTE for Diabetes Prediction: Implementation of an Application Based on Indonesian Hospital Data

Authors

  • Arief Wibowo, Anis Fitri Nur Masruriyah, Selly Rahmawati

Keywords:

Clinical Decision Support, Diabetes Prediction, Healthcare Diagnostics, Imbalanced Data, Machine Learning, Synthetic Minority Over-sampling Technique

Abstract

Diabetes mellitus is a widespread chronic condition affecting millions globally, including a substantial population in Indonesia. Accurate and early detection is critical for effective management and treatment, and machine learning offers promising solutions for enhancing predictive accuracy. This study evaluates three machine learning algorithms: Support Vector Machine (SVM), Logistic Regression, and Naive Bayes, with and without the application of Synthetic Minority Over-sampling Technique (SMOTE) to tackle data imbalance. Data were meticulously collected from an Indonesian regional hospital, including various medical parameters such as age, body mass index (BMI), blood sugar levels, blood pressure, and family history. Our findings reveal that the SVM model, without SMOTE, achieved an accuracy of 95%, precision of 95%, recall of 97%, and an AUC of 98%. With SMOTE, SVM's performance improved to an accuracy of 95.8%, precision of 97%, recall of 94.6%, and an AUC of 99.1%. Logistic Regression without SMOTE demonstrated an accuracy of 94.8%, precision of 96.2%, recall of 96.2%, and an AUC of 98.3%, while with SMOTE, it reached an accuracy of 95.6%, precision of 97.9%, recall of 93.3%, and an AUC of 98.7%. The Naive Bayes model showed an accuracy of 93.5%, precision of 98.5%, recall of 91.9%, and an AUC of 98.1%, improving with SMOTE to an accuracy of 94.3%, precision of 98.3%, recall of 90.2%, and an AUC of 98.6%. The best-performing model, SVM with SMOTE, was implemented into a desktop application. This application successfully validated the model's predictive capabilities, demonstrating effective and accurate diabetes detection in practical scenarios. Our study highlights the significant impact of SMOTE on enhancing model performance and emphasizes the importance of sophisticated machine learning techniques in advancing healthcare diagnostics. This work provides a foundation for further development and deployment of predictive models in clinical settings, contributing to improved patient care and disease management.

Downloads

Published

2024-09-30

How to Cite

Arief Wibowo, Anis Fitri Nur Masruriyah, Selly Rahmawati. (2024). Leveraging Machine Learning and SMOTE for Diabetes Prediction: Implementation of an Application Based on Indonesian Hospital Data. International Journal of Communication Networks and Information Security (IJCNIS), 16(4), 1033–1041. Retrieved from https://ijcnis.org/index.php/ijcnis/article/view/7295

Issue

Section

Research Articles