Evidence bound clinical decision support with RAG
| dc.contributor.author | Sener, Leyla | |
| dc.contributor.author | Yilmaz, Omit | |
| dc.contributor.author | Dikmen, Can | |
| dc.contributor.author | Ari, Ali | |
| dc.contributor.author | Karadag, Teoman | |
| dc.date.accessioned | 2026-04-04T13:32:56Z | |
| dc.date.available | 2026-04-04T13:32:56Z | |
| dc.date.issued | 2025 | |
| dc.department | İnönü Üniversitesi | |
| dc.description.abstract | Large language models are increasingly consulted for scientific and clinical questions, yet ungrounded answers still appear too often to trust them on their own. We built a retrieval-augmented assistant that keeps generation tied to a curated, versioned corpus, and records every step from ingestion to answer. Documents are segmented with a practical, token-aware policy and encoded locally; vectors are stored with provenance so the system can cite or abstain. Queries are embedded, top-k passages are retrieved from a vector store, and a prompt asks the generator to respond only with supported statements or to decline. The components are intentionally swappable: the embedder runs on-premises for privacy, the store supports snapshots for repeatable experiments, and the generator (Gemma/Gemma2) is selected for efficient inference. Beyond the pipeline, we preregister an evaluation plan that measures retrieval quality, answer faithfulness and coverage, with ablations on chunk size, overlap, and k. All code, defaults, and scripts are released so others can reproduce the setup, compare their own choices, and extend the system to new domains. The goal is clear: reduce hallucination by grounding answers in literature, keep costs and latency predictable on a single-GPU server, and make empirical evaluation routine rather than optional. Experimental evaluation confirmed these design claims: the proposed modular RAG achieved Recall@k = 0.86, F1 = 0.79, and Attribution Accuracy = 0.91, significantly outperforming both Classic RAG and LLM-only baselines (p < 0.05). These results validate the framework's reliability, grounding fidelity, and reproducibility for evidence-based clinical decision support. | |
| dc.identifier.doi | 10.2339/politeknik.1810629 | |
| dc.identifier.issn | 1302-0900 | |
| dc.identifier.issn | 2147-9429 | |
| dc.identifier.orcid | 0009-0000-0039-0801 | |
| dc.identifier.orcid | 0000-0002-7682-7771 | |
| dc.identifier.orcid | 0000-0002-5071-6790 | |
| dc.identifier.uri | https://doi.org/10.2339/politeknik.1810629 | |
| dc.identifier.uri | https://hdl.handle.net/11616/108784 | |
| dc.identifier.wos | WOS:001622046100001 | |
| dc.identifier.wosquality | Q4 | |
| dc.indekslendigikaynak | Web of Science | |
| dc.language.iso | tr | |
| dc.publisher | Gazi Univ | |
| dc.relation.ispartof | Journal of Polytechnic-Politeknik Dergisi | |
| dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.snmz | KA_WOS_20250329 | |
| dc.subject | Retrieval-augmented generation | |
| dc.subject | clinical decision support | |
| dc.subject | hallucination mitigation | |
| dc.subject | information retrieval | |
| dc.subject | explainable ai | |
| dc.subject | large language model | |
| dc.title | Evidence bound clinical decision support with RAG | |
| dc.type | Article |











