Prediction of COVID-19 Based on Genomic Biomarkers of Metagenomic Next-Generation Sequencing Data Using Artificial Intelligence Technology

Akbulut, SamiYagin, Fatma HilalColak, Cemil2024-08-042024-08-0420222149-22472149-2549https://doi.org/10.14744/etd.2022.00868https://search.trdizin.gov.tr/yayin/detay/1173238https://hdl.handle.net/11616/92917Objective: The primary aim of this study was to use metagenomic next-generation sequencing (mNGS) data to identify coronavirus 2019 (COVID-19)-related biomarker genes and to construct a machine learning model that could successfully differentiate patients with COVID-19 from healthy controls. Materials and Methods: The mNGS dataset used in the study demonstrated expression of 15,979 genes in the upper airway in 234 patients who were COVID-19 negative and COVID-19 positive. The Boruta method was used to select qualitative biomarker genes associated with COVID-19. Random forest (RF), gradient boosting tree (GBT), and multi-layer perceptron (MLP) models were used to predict COVID-19 based on the selected biomarker genes. Results: The MLP (0.936) model outperformed the GBT (0.851), and RF (0.809) models in predicting COVID-19. The three most important biomarker candidate genes associated with COVID-19 were IFI27, TPTI, and FAM83A. Conclusion: The proposed model (MLP) was able to predict COVID-19 successfully. The results showed that the generated model and selected biomarker candidate genes can be used as diagnostic models for clinical testing or potential therapeutic targets and vaccine design.eninfo:eu-repo/semantics/openAccessArtificial intelligenceBorutaCOVID-19 pandemicfeature selectionmulti-layer perceptronSARS-CoV-2 virusPrediction of COVID-19 Based on Genomic Biomarkers of Metagenomic Next-Generation Sequencing Data Using Artificial Intelligence TechnologyArticle10.14744/etd.2022.008681173238WOS:000821276800001N/A