Predicting lung cancer risk based on artificial intelligence: Leveraging multifactorial inputs for early detection
Küçük Resim Yok
Tarih
2025
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
Lung cancer remains the leading cause of cancer-related mortality worldwide, largely because most cases are detected at advanced stages. This study develops and validates multifactorial machine-learning models that integrate demographic, behavioural, psychological, symptom-based and comorbidity variables to identify individuals at high risk of lung cancer. An anonymised dataset of 13.000 subjects (74% lung-cancer positive) obtained from the public “Lung Cancer Patient Records” repository was pre-processed through recoding, one-hot encoding and stratified train/test partitioning. To address class imbalance the training subset was balanced with Synthetic Minority Oversampling Technique (SMOTE). Three supervised algorithms—Logistic Regression, Random Forest and Extreme Gradient Boosting (XGBoost)—were tuned via grid search with five-fold stratified cross-validation optimising area under the receiver-operating-characteristic curve (AUC). On the independent hold-out set XGBoost achieved superior discrimination (AUC=0.93), sensitivity (0.95) and F1-score (0.93), followed closely by Random Forest (AUC=0.91). Univariate analyses confirmed significant associations (p<0.001) between lung cancer status and all candidate predictors, with the strongest effect sizes observed for yellow fingers, persistent cough, wheezing, fatigue and peer-pressure–related smoking. The findings demonstrate that incorporating easily elicited clinical symptoms and psychosocial factors alongside traditional risk markers markedly improves early-detection performance over age–smoking models alone. Because all inputs are non-invasive and low-cost, the proposed model can be embedded in electronic-health-record decision support or mobile triage applications, particularly benefiting resource-limited settings. Future work will focus on external validation across diverse populations, temporal modelling of symptom trajectories and cost-effectiveness analyses to inform risk-tailored low-dose CT screening protocols.
Açıklama
Anahtar Kelimeler
Solunum Sistemi, Onkoloji, Bilgisayar Bilimleri, Yapay Zeka
Kaynak
Medicine Science
WoS Q Değeri
Scopus Q Değeri
Cilt
14
Sayı
4











