A comparison of multivariate statistical methods to detect risk factors for type 2 diabetes mellitus

dc.contributor.authorÇiçek, İpek Balıkçı
dc.contributor.authorYoloğlu, Saim
dc.contributor.authorŞahin, İbrahim
dc.date.accessioned2024-08-04T19:42:42Z
dc.date.available2024-08-04T19:42:42Z
dc.date.issued2023
dc.departmentİnönü Üniversitesien_US
dc.description.abstractAim: The goal of this study is to compare the performances of Logistic Regression (LR), Artificial Neural Networks (ANN) and Decision Tree models, which are machine learning classification methods, in the diagnosis of type 2 Diabetes Mellitus (DM) and to determine the most successful method. It is also the examination of risk factors affecting type 2 DM using these models. Materials and Methods: The study’s data was collected from patients who visited the Diabetes and Thyroid polyclinic at the Inonu University Faculty of Medicine Turgut Ozal Medical Center, Department of Internal Medicine. The k-Nearest Neighbor algorithm, which is one of the missing value assignment methods, was used to eliminate the prob- lems related to missing values. Sensitivity, accuracy, precision, specificity, AUC F1-score, and classification error were used as performance evaluation criteria. Evolutionary algo- rithm parameter optimization method was used to optimize the parameters of the ANN model. Missing value assignment, modeling and parameter optimization were done with Rapidminer Studio Free version 8.1. Results: Among the three methods applied in the diagnosis of type 2 DM, the ANN gave the best classification performance. The accuracy, sensitivity, selectivity, precision, F1-score, AUC and classification error values obtained from this method are respectively; 98.94%, 100%, 97.73%, 98.04%, 99.01%, 0.978 and 1.06. For the ANN method, the im- portance values of the gender, long-term drug use, family history, concomitant disease, cortisone use, stress factor, high blood pressure, smoking, high cholesterol, heart dis- ease, exercise status, carbohydrate use, alcohol consumption, vegetable use, meat use, age, weight, height, starting age, daily bread consumption, LDL, HDL, Total Cholesterol, Triglyceride, Fasting blood sugar the importance values of independent variables are re- spectively; 0.017, 0.009, 0.013, 0.017, 0.008, 0.016, 0.008, 0.006, 0.053, 0.024, 0.023, 0.040, 0.007, 0.020, 0.007, 0.046, 0.083, 0.049, 0.024, 0.066, 0.084, 0.083, 0.020, 0.031, 0.244. Conclusion: According to the performance criteria obtained from the three classifica- tion models used to predict type 2 DM; it has been found that the best classification performance belongs to the ANN model. According to the ANN method, the three most important risk factors that may cause type 2 DM were found to be fasting blood glucose, LDL, and HDL, respectively.en_US
dc.identifier.doi10.5455/annalsmedres.2022.09.276
dc.identifier.endpage174en_US
dc.identifier.issn2636-7688
dc.identifier.issue2en_US
dc.identifier.startpage167en_US
dc.identifier.trdizinid1162075en_US
dc.identifier.urihttps://doi.org/10.5455/annalsmedres.2022.09.276
dc.identifier.urihttps://search.trdizin.gov.tr/yayin/detay/1162075
dc.identifier.urihttps://hdl.handle.net/11616/88593
dc.identifier.volume30en_US
dc.indekslendigikaynakTR-Dizinen_US
dc.language.isoenen_US
dc.relation.ispartofAnnals of Medical Researchen_US
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.titleA comparison of multivariate statistical methods to detect risk factors for type 2 diabetes mellitusen_US
dc.typeArticleen_US

Dosyalar