Handling imbalanced class problem for the prediction of atrial fibrillation in obese patient

dc.authoridÇOLAK, CEMİL/0000-0001-5406-098X
dc.authoridARSLAN, Ahmet Kadir/0000-0001-8626-9542
dc.authoridkaraaslan, erol/0000-0002-8534-3680
dc.authoridErdil, Nevzat/0000-0002-8275-840X;
dc.authorwosidÇOLAK, CEMİL/ABI-3261-2020
dc.authorwosidARSLAN, Ahmet Kadir/AAA-2409-2020
dc.authorwosidkaraaslan, erol/ABI-2700-2020
dc.authorwosidErdil, Nevzat/K-8079-2019
dc.authorwosidColak, M. Cengiz/ABI-3394-2020
dc.contributor.authorColak, Cengiz M.
dc.contributor.authorKaraaslan, Erol
dc.contributor.authorColak, Cemil
dc.contributor.authorArslan, Ahmet Kadir
dc.contributor.authorErdil, Nevzat
dc.date.accessioned2024-08-04T20:43:07Z
dc.date.available2024-08-04T20:43:07Z
dc.date.issued2017
dc.departmentİnönü Üniversitesien_US
dc.description.abstractObjective: Atrial Fibrillation (AF) is one of the important public health problems with elevated comorbidity, advanced mortality risk, and increasing healthcare costs. In this study, the objective is to explore and resolve the imbalanced class problem for the prediction of AF in obese individuals and to compare the predictive results of balanced and imbalanced datasets by several data mining approaches. Materials and methods: The retrospective study contained 362 successive obese individuals undergoing Coronary Artery Bypass Grafting (CABG) operation at the cardiovascular surgery clinic. AF developed postoperatively (AF Group) in 42 of the patients, whereas AF did not develop (non-AF Group) in 320 individuals. The Synthetic Minority Over-sampling Technique (SMOTE) was performed to balance the distribution of the target variable (AF/non-AF groups). The LogitBoost and GLMBoost ensemble approaches were constructed with 10-fold cross validation. Results: After applying SMOTE algorithm, the number of subjects in AF and non-AF was almost balanced (336 in AF and 320 in non-AF groups). The values of accuracy were 0.8812 (0.8433-0.9127) for GLMBoost and 0.9144 (0.8806-0.9411) for LogitBoost on the imbalanced dataset, and 0.8247 (0.7934-0.853) for GLMBoost and 0.9695 (0.9533-0.9813) for LogitBoost on the balanced dataset by SMOTE. The values of the area under the receiver operating curve for GLMBoost and LogitBoost were 0.5088 (0.485-0.5325) and 0.6827 (0.608-0.7573) on imbalanced dataset, and were 0.8259 (0.7971-0.8546) and 0.9696 (0.9564-0.9827) on balanced dataset, respectively. Conclusions: The predicted results indicated that LogitBoost on the balanced dataset by SMOTE had the highest and most accurate values of performance metrics. Hence, SMOTE and other oversampling approaches may be beneficial to overcome class imbalance issues emerging in biomedical studies.en_US
dc.description.sponsorshipInonu University Scientific Research Coordination Unit [2016/61]en_US
dc.description.sponsorshipWe would like to thank to Inonu University Scientific Research Coordination Unit to support by a grant this study. (Project number: 2016/61).en_US
dc.identifier.endpage3299en_US
dc.identifier.issn0970-938X
dc.identifier.issn0976-1683
dc.identifier.issue7en_US
dc.identifier.scopus2-s2.0-85018508232en_US
dc.identifier.scopusqualityN/Aen_US
dc.identifier.startpage3293en_US
dc.identifier.urihttps://hdl.handle.net/11616/97795
dc.identifier.volume28en_US
dc.identifier.wosWOS:000403452400080en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherAllied Acaden_US
dc.relation.ispartofBiomedical Research-Indiaen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectImbalanced dataset classificationen_US
dc.subjectAtrial fibrillation GLMBoosten_US
dc.subjectLogitBoosten_US
dc.subjectSynthetic minority oversampling techniqueen_US
dc.titleHandling imbalanced class problem for the prediction of atrial fibrillation in obese patienten_US
dc.typeArticleen_US

Dosyalar