Explainable artificial intelligence model for identifying COVID-19 gene biomarkers

dc.authoridÇOLAK, CEMİL/0000-0001-5406-098X
dc.authoridAkbulut, Sami/0000-0002-6864-7711
dc.authoridYagin, Fatma Hilal/0000-0002-9848-7958
dc.authoridAzzeh, Mohammad/0000-0002-0323-6452
dc.authoridAlkhateeb, Abedalrhman/0000-0002-1751-7570
dc.authorwosidÇOLAK, CEMİL/ABI-3261-2020
dc.authorwosidAkbulut, Sami/L-9568-2014
dc.authorwosidYagin, Fatma Hilal/ABI-8066-2020
dc.authorwosidAzzeh, Mohammad/G-5472-2017
dc.contributor.authorYagin, Fatma Hilal
dc.contributor.authorCicek, Ipek Balikci
dc.contributor.authorAlkhateeb, Abedalrhman
dc.contributor.authorYagin, Burak
dc.contributor.authorColak, Cemil
dc.contributor.authorAzzeh, Mohammad
dc.contributor.authorAkbulut, Sami
dc.date.accessioned2024-08-04T20:53:23Z
dc.date.available2024-08-04T20:53:23Z
dc.date.issued2023
dc.departmentİnönü Üniversitesien_US
dc.description.abstractAim: COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples.Methods: In the data set used in the study, there are 15,979 gene expressions of 234 patients with COVID-19 negative 141 (60.3%) and COVID-19 positive 93 (39.7%). The least absolute shrinkage and selection operator (LASSO) method was applied to select genes associated with COVID-19. Support Vector Machine -Synthetic Minority Oversampling Technique (SVM-SMOTE) method was used to handle the class imbalance problem. Logistics regression (LR), SVM, random forest (RF), and extreme gradient boosting (XGBoost) methods were constructed to predict COVID-19. An explainable approach based on local interpretable model-agnostic expla-nations (LIME) and SHAPley Additive exPlanations (SHAP) methods was applied to determine COVID-19-associated biomarker candidate genes and improve the final model's interpretability.Results: For the diagnosis of COVID-19, the XGBoost (accuracy: 0.930) model outperformed the RF (accuracy: 0.912), SVM (accuracy: 0.877), and LR (accuracy: 0.912) models. As a result of the SHAP, the three most important genes associated with COVID-19 were IFI27, LGR6, and FAM83A. The results of LIME showed that especially the high level of IFI27 gene expression contributed to increasing the probability of positive class.Conclusions: The proposed model (XGBoost) was able to predict COVID-19 successfully. The results show that machine learning combined with LIME and SHAP can explain the biomarker prediction for COVID-19 and provide clinicians with an intuitive understanding and interpretability of the impact of risk factors in the model.en_US
dc.description.sponsorshipKing Abdullah I School of Graduate Studies and Scientific Research at the Princess Sumaya University for Technology [2021/2022 -25 (16)]; [2022/4040]en_US
dc.description.sponsorshipFunding Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki and was approved by the Institute?s Clinical Research Ethics Committee (proto-col code = 2022/4040) . This work has been partially supported by King Abdullah I School of Graduate Studies and Scientific Research at the Princess Sumaya University for Technology with grant number 2021/2022 -25 (16) , recieved by Abedalrhman Alkhateeb.en_US
dc.identifier.doi10.1016/j.compbiomed.2023.106619
dc.identifier.issn0010-4825
dc.identifier.issn1879-0534
dc.identifier.pmid36738712en_US
dc.identifier.scopus2-s2.0-85147196638en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.urihttps://doi.org/10.1016/j.compbiomed.2023.106619
dc.identifier.urihttps://hdl.handle.net/11616/101144
dc.identifier.volume154en_US
dc.identifier.wosWOS:000931797500001en_US
dc.identifier.wosqualityQ1en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.indekslendigikaynakPubMeden_US
dc.language.isoenen_US
dc.publisherPergamon-Elsevier Science Ltden_US
dc.relation.ispartofComputers in Biology and Medicineen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectCOVID-19en_US
dc.subjectExplainable artificial intelligenceen_US
dc.subjectLIMEen_US
dc.subjectSHAPen_US
dc.subjectXGBoosten_US
dc.titleExplainable artificial intelligence model for identifying COVID-19 gene biomarkersen_US
dc.typeArticleen_US

Dosyalar