Handling imbalanced class problem for the prediction of atrial fibrillation in obese patient
Küçük Resim Yok
Tarih
2017
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Allied Acad
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
Objective: Atrial Fibrillation (AF) is one of the important public health problems with elevated comorbidity, advanced mortality risk, and increasing healthcare costs. In this study, the objective is to explore and resolve the imbalanced class problem for the prediction of AF in obese individuals and to compare the predictive results of balanced and imbalanced datasets by several data mining approaches. Materials and methods: The retrospective study contained 362 successive obese individuals undergoing Coronary Artery Bypass Grafting (CABG) operation at the cardiovascular surgery clinic. AF developed postoperatively (AF Group) in 42 of the patients, whereas AF did not develop (non-AF Group) in 320 individuals. The Synthetic Minority Over-sampling Technique (SMOTE) was performed to balance the distribution of the target variable (AF/non-AF groups). The LogitBoost and GLMBoost ensemble approaches were constructed with 10-fold cross validation. Results: After applying SMOTE algorithm, the number of subjects in AF and non-AF was almost balanced (336 in AF and 320 in non-AF groups). The values of accuracy were 0.8812 (0.8433-0.9127) for GLMBoost and 0.9144 (0.8806-0.9411) for LogitBoost on the imbalanced dataset, and 0.8247 (0.7934-0.853) for GLMBoost and 0.9695 (0.9533-0.9813) for LogitBoost on the balanced dataset by SMOTE. The values of the area under the receiver operating curve for GLMBoost and LogitBoost were 0.5088 (0.485-0.5325) and 0.6827 (0.608-0.7573) on imbalanced dataset, and were 0.8259 (0.7971-0.8546) and 0.9696 (0.9564-0.9827) on balanced dataset, respectively. Conclusions: The predicted results indicated that LogitBoost on the balanced dataset by SMOTE had the highest and most accurate values of performance metrics. Hence, SMOTE and other oversampling approaches may be beneficial to overcome class imbalance issues emerging in biomedical studies.
Açıklama
Anahtar Kelimeler
Imbalanced dataset classification, Atrial fibrillation GLMBoost, LogitBoost, Synthetic minority oversampling technique
Kaynak
Biomedical Research-India
WoS Q Değeri
N/A
Scopus Q Değeri
N/A
Cilt
28
Sayı
7