Explainable artificial intelligence model for identifying COVID-19 gene biomarkers
dc.authorid | ÇOLAK, CEMİL/0000-0001-5406-098X | |
dc.authorid | Akbulut, Sami/0000-0002-6864-7711 | |
dc.authorid | Yagin, Fatma Hilal/0000-0002-9848-7958 | |
dc.authorid | Azzeh, Mohammad/0000-0002-0323-6452 | |
dc.authorid | Alkhateeb, Abedalrhman/0000-0002-1751-7570 | |
dc.authorwosid | ÇOLAK, CEMİL/ABI-3261-2020 | |
dc.authorwosid | Akbulut, Sami/L-9568-2014 | |
dc.authorwosid | Yagin, Fatma Hilal/ABI-8066-2020 | |
dc.authorwosid | Azzeh, Mohammad/G-5472-2017 | |
dc.contributor.author | Yagin, Fatma Hilal | |
dc.contributor.author | Cicek, Ipek Balikci | |
dc.contributor.author | Alkhateeb, Abedalrhman | |
dc.contributor.author | Yagin, Burak | |
dc.contributor.author | Colak, Cemil | |
dc.contributor.author | Azzeh, Mohammad | |
dc.contributor.author | Akbulut, Sami | |
dc.date.accessioned | 2024-08-04T20:53:23Z | |
dc.date.available | 2024-08-04T20:53:23Z | |
dc.date.issued | 2023 | |
dc.department | İnönü Üniversitesi | en_US |
dc.description.abstract | Aim: COVID-19 has revealed the need for fast and reliable methods to assist clinicians in diagnosing the disease. This article presents a model that applies explainable artificial intelligence (XAI) methods based on machine learning techniques on COVID-19 metagenomic next-generation sequencing (mNGS) samples.Methods: In the data set used in the study, there are 15,979 gene expressions of 234 patients with COVID-19 negative 141 (60.3%) and COVID-19 positive 93 (39.7%). The least absolute shrinkage and selection operator (LASSO) method was applied to select genes associated with COVID-19. Support Vector Machine -Synthetic Minority Oversampling Technique (SVM-SMOTE) method was used to handle the class imbalance problem. Logistics regression (LR), SVM, random forest (RF), and extreme gradient boosting (XGBoost) methods were constructed to predict COVID-19. An explainable approach based on local interpretable model-agnostic expla-nations (LIME) and SHAPley Additive exPlanations (SHAP) methods was applied to determine COVID-19-associated biomarker candidate genes and improve the final model's interpretability.Results: For the diagnosis of COVID-19, the XGBoost (accuracy: 0.930) model outperformed the RF (accuracy: 0.912), SVM (accuracy: 0.877), and LR (accuracy: 0.912) models. As a result of the SHAP, the three most important genes associated with COVID-19 were IFI27, LGR6, and FAM83A. The results of LIME showed that especially the high level of IFI27 gene expression contributed to increasing the probability of positive class.Conclusions: The proposed model (XGBoost) was able to predict COVID-19 successfully. The results show that machine learning combined with LIME and SHAP can explain the biomarker prediction for COVID-19 and provide clinicians with an intuitive understanding and interpretability of the impact of risk factors in the model. | en_US |
dc.description.sponsorship | King Abdullah I School of Graduate Studies and Scientific Research at the Princess Sumaya University for Technology [2021/2022 -25 (16)]; [2022/4040] | en_US |
dc.description.sponsorship | Funding Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki and was approved by the Institute?s Clinical Research Ethics Committee (proto-col code = 2022/4040) . This work has been partially supported by King Abdullah I School of Graduate Studies and Scientific Research at the Princess Sumaya University for Technology with grant number 2021/2022 -25 (16) , recieved by Abedalrhman Alkhateeb. | en_US |
dc.identifier.doi | 10.1016/j.compbiomed.2023.106619 | |
dc.identifier.issn | 0010-4825 | |
dc.identifier.issn | 1879-0534 | |
dc.identifier.pmid | 36738712 | en_US |
dc.identifier.scopus | 2-s2.0-85147196638 | en_US |
dc.identifier.scopusquality | Q1 | en_US |
dc.identifier.uri | https://doi.org/10.1016/j.compbiomed.2023.106619 | |
dc.identifier.uri | https://hdl.handle.net/11616/101144 | |
dc.identifier.volume | 154 | en_US |
dc.identifier.wos | WOS:000931797500001 | en_US |
dc.identifier.wosquality | Q1 | en_US |
dc.indekslendigikaynak | Web of Science | en_US |
dc.indekslendigikaynak | Scopus | en_US |
dc.indekslendigikaynak | PubMed | en_US |
dc.language.iso | en | en_US |
dc.publisher | Pergamon-Elsevier Science Ltd | en_US |
dc.relation.ispartof | Computers in Biology and Medicine | en_US |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | COVID-19 | en_US |
dc.subject | Explainable artificial intelligence | en_US |
dc.subject | LIME | en_US |
dc.subject | SHAP | en_US |
dc.subject | XGBoost | en_US |
dc.title | Explainable artificial intelligence model for identifying COVID-19 gene biomarkers | en_US |
dc.type | Article | en_US |