Interpretable Machine Learning for Serum-Based Metabolomics in Breast Cancer Diagnostics: Insights from Multi-Objective Feature Selection-Driven LightGBM-SHAP Models

dc.contributor.authorGuldogan, Emek
dc.contributor.authorYagin, Fatma Hilal
dc.contributor.authorUcuzal, Hasan
dc.contributor.authorAlzakari, Sarah A.
dc.contributor.authorAlhussan, Amel Ali
dc.contributor.authorArdigo, Luca Paolo
dc.date.accessioned2026-04-04T13:30:59Z
dc.date.available2026-04-04T13:30:59Z
dc.date.issued2025
dc.departmentİnönü Üniversitesi
dc.description.abstractBackground and Objectives: Breast cancer accounts for 12.5% of all new cancer cases in women worldwide. Early detection significantly improves survival rates, but traditional biomarkers like CA 15-3 and HER2 lack sensitivity and specificity, particularly for early-stage disease. Advances in metabolomics and machine learning, particularly explainable artificial intelligence (XAI), offer new opportunities for identifying robust biomarkers and improving diagnostic accuracy. This study aimed to identify and validate serum-based metabolic biomarkers for breast cancer using advanced metabolomic profiling techniques and a Light Gradient Boosting Machine (LightGBM) model. Additionally, SHapley Additive exPlanations (SHAP) were applied to enhance model interpretability and biological insight. Materials and Methods: The study included 103 breast cancer patients and 31 healthy controls. Serum samples underwent liquid and gas chromatography-time-of-flight mass spectrometry (LC-TOFMS and GC-TOFMS). Mutual Information (MI), Sparse Partial Least Squares (sPLS), Boruta, and Multi-Objective Feature Selection (MOFS) approaches were applied to the data for biomarker discovery. LightGBM, AdaBoost, and Random Forest were employed for classification and to identify class imbalance with the Synthetic Minority Oversampling Technique (SMOTE). SHAP analysis ranked metabolites based on their contribution to model predictions. Results: Compared to other feature selection approaches, the MOFS approach was more robust in terms of predictive performance, and metabolites identified by this method were used in subsequent analyses for biomarker discovery. LightGBM outperformed the AdaBoost and Random Forest models, achieving 86.6% accuracy, 89.1% sensitivity, 84.2% specificity, and an F1-score of 87.0%. SHAP analysis identified 2-Aminobutyric acid, choline, and coproporphyrin as the most influential metabolites, with dysregulation of these markers associated with breast cancer risk. Conclusions: This study is among the first to integrate SHAP explainability with metabolomic profiling, bridging computational predictions and biological insights for improved clinical adoption. This study demonstrates the effectiveness of combining metabolomics with XAI-driven machine learning for breast cancer diagnostics. The identified biomarkers not only improve diagnostic accuracy but also reveal critical metabolic dysregulations associated with disease progression.
dc.description.sponsorshipPrincess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia [PNURSP2025R716]
dc.description.sponsorshipThis study was supported by the Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R716), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
dc.identifier.doi10.3390/medicina61061112
dc.identifier.issn1010-660X
dc.identifier.issn1648-9144
dc.identifier.issue6
dc.identifier.orcid0000-0001-7677-5070
dc.identifier.orcid0000-0002-9848-7958
dc.identifier.orcid0000-0001-7530-7961
dc.identifier.orcid0000-0003-4870-3015
dc.identifier.orcid0000-0002-5436-8164
dc.identifier.pmid40572800
dc.identifier.scopus2-s2.0-105009137249
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.3390/medicina61061112
dc.identifier.urihttps://hdl.handle.net/11616/108506
dc.identifier.volume61
dc.identifier.wosWOS:001515896100001
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.language.isoen
dc.publisherMdpi
dc.relation.ispartofMedicina-Lithuania
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WOS_20250329
dc.subjectbreast cancer
dc.subjectmetabolomics
dc.subjectexplainable AI
dc.subjectLightGBM
dc.subjectSHAP
dc.subjectbiomarkers
dc.subjectdiagnostic accuracy
dc.titleInterpretable Machine Learning for Serum-Based Metabolomics in Breast Cancer Diagnostics: Insights from Multi-Objective Feature Selection-Driven LightGBM-SHAP Models
dc.typeArticle

Dosyalar