Evaluating the performance of AI chatbots in responding to dental implant FAQs: A comparative study

dc.contributor.authorTuzlali, Mesut
dc.contributor.authorBaki, Nagehan
dc.contributor.authorAral, Kubra
dc.contributor.authorAral, Cuneyt Asim
dc.contributor.authorBahce, Erkan
dc.date.accessioned2026-04-04T13:33:07Z
dc.date.available2026-04-04T13:33:07Z
dc.date.issued2025
dc.departmentİnönü Üniversitesi
dc.description.abstractBackground This study aims to evaluate and compare the performance of five publicly accessible large-language-model (LLM) based chatbots-ChatGPT-o1, Deepseek-R1, Google-Gemini-Advanced, Claude-3.5-Sonnet, and Perplexity-Pro-in providing responses to frequently asked questions (FAQs) about dental implant treatment. The primary goal was to assess the accuracy, completeness, clarity, relevance, and consistency of chatbot-generated answers. Methods A total of 45 FAQs commonly encountered in clinical practice and online patient forums regarding dental implants were selected and categorized into nine thematic domains. Each question was submitted to the chatbots individually using a standardized protocol. Responses were independently assessed by a panel of four dental experts and one layperson using a 5-point Likert-scale. Python with Google-Colab was used for statistical analysis. Results ChatGPT-o1 achieved the highest overall performance, particularly in relevance (M = 4.99), consistency (M = 4.97), and accuracy (M = 4.96). Deepseek-R1 followed closely, with strong scores in completeness and relevance. Claude-3.5-Sonnet ranked moderately, while Gemini-Advanced and Perplexity-Pro showed lower performance in completeness and clarity. Significant differences were observed among chatbots across all criteria (p < 0.001). Inter-rater reliability was high (alpha = 0.87), confirming consistency among evaluators. Conclusions AI-driven chatbots demonstrated strong potential in delivering accurate and patient-friendly information about dental implant treatment. However, performance varied considerably across platforms, with ChatGPT-o1 and Deepseek-R1 showing the highest reliability. These findings highlight the emerging role of AI chatbots as supplementary tools in dental education and patient communication, while also underscoring the need for continued validation and ethical oversight in clinical applications.
dc.identifier.doi10.1186/s12903-025-06863-w
dc.identifier.issn1472-6831
dc.identifier.issue1
dc.identifier.orcid0000-0002-7366-345X
dc.identifier.orcid0000-0003-4798-4548
dc.identifier.orcid0000-0002-7602-8101
dc.identifier.orcid0000-0001-5389-5571
dc.identifier.pmid41063105
dc.identifier.scopus2-s2.0-105018271536
dc.identifier.scopusqualityQ2
dc.identifier.urihttps://doi.org/10.1186/s12903-025-06863-w
dc.identifier.urihttps://hdl.handle.net/11616/108951
dc.identifier.volume25
dc.identifier.wosWOS:001590971600020
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.indekslendigikaynakPubMed
dc.language.isoen
dc.publisherBmc
dc.relation.ispartofBmc Oral Health
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WOS_20250329
dc.subjectDental implant
dc.subjectArtificial intelligence
dc.subjectImplantology
dc.subjectChatGPT
dc.subjectDeepseek
dc.subjectGoogle gemini advanced
dc.subjectClaude
dc.subjectPerplexity pro
dc.titleEvaluating the performance of AI chatbots in responding to dental implant FAQs: A comparative study
dc.typeArticle

Dosyalar