Evaluating the performance of AI chatbots in responding to dental implant FAQs: A comparative study

Tuzlali, Mesut; Baki, Nagehan; Aral, Kubra; Aral, Cuneyt Asim; Bahce, Erkan

Evaluating the performance of AI chatbots in responding to dental implant FAQs: A comparative study

dc.contributor.author	Tuzlali, Mesut
dc.contributor.author	Baki, Nagehan
dc.contributor.author	Aral, Kubra
dc.contributor.author	Aral, Cuneyt Asim
dc.contributor.author	Bahce, Erkan
dc.date.accessioned	2026-04-04T13:33:07Z
dc.date.available	2026-04-04T13:33:07Z
dc.date.issued	2025
dc.department	İnönü Üniversitesi
dc.description.abstract	Background This study aims to evaluate and compare the performance of five publicly accessible large-language-model (LLM) based chatbots-ChatGPT-o1, Deepseek-R1, Google-Gemini-Advanced, Claude-3.5-Sonnet, and Perplexity-Pro-in providing responses to frequently asked questions (FAQs) about dental implant treatment. The primary goal was to assess the accuracy, completeness, clarity, relevance, and consistency of chatbot-generated answers. Methods A total of 45 FAQs commonly encountered in clinical practice and online patient forums regarding dental implants were selected and categorized into nine thematic domains. Each question was submitted to the chatbots individually using a standardized protocol. Responses were independently assessed by a panel of four dental experts and one layperson using a 5-point Likert-scale. Python with Google-Colab was used for statistical analysis. Results ChatGPT-o1 achieved the highest overall performance, particularly in relevance (M = 4.99), consistency (M = 4.97), and accuracy (M = 4.96). Deepseek-R1 followed closely, with strong scores in completeness and relevance. Claude-3.5-Sonnet ranked moderately, while Gemini-Advanced and Perplexity-Pro showed lower performance in completeness and clarity. Significant differences were observed among chatbots across all criteria (p < 0.001). Inter-rater reliability was high (alpha = 0.87), confirming consistency among evaluators. Conclusions AI-driven chatbots demonstrated strong potential in delivering accurate and patient-friendly information about dental implant treatment. However, performance varied considerably across platforms, with ChatGPT-o1 and Deepseek-R1 showing the highest reliability. These findings highlight the emerging role of AI chatbots as supplementary tools in dental education and patient communication, while also underscoring the need for continued validation and ethical oversight in clinical applications.
dc.identifier.doi	10.1186/s12903-025-06863-w
dc.identifier.issn	1472-6831
dc.identifier.issue	1
dc.identifier.orcid	0000-0002-7366-345X
dc.identifier.orcid	0000-0003-4798-4548
dc.identifier.orcid	0000-0002-7602-8101
dc.identifier.orcid	0000-0001-5389-5571
dc.identifier.pmid	41063105
dc.identifier.scopus	2-s2.0-105018271536
dc.identifier.scopusquality	Q2
dc.identifier.uri	https://doi.org/10.1186/s12903-025-06863-w
dc.identifier.uri	https://hdl.handle.net/11616/108951
dc.identifier.volume	25
dc.identifier.wos	WOS:001590971600020
dc.identifier.wosquality	Q1
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	PubMed
dc.language.iso	en
dc.publisher	Bmc
dc.relation.ispartof	Bmc Oral Health
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_WOS_20250329
dc.subject	Dental implant
dc.subject	Artificial intelligence
dc.subject	Implantology
dc.subject	ChatGPT
dc.subject	Deepseek
dc.subject	Google gemini advanced
dc.subject	Claude
dc.subject	Perplexity pro
dc.title	Evaluating the performance of AI chatbots in responding to dental implant FAQs: A comparative study
dc.type	Article

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
PubMed İndeksli Yayın Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Evaluating the performance of AI chatbots in responding to dental implant FAQs: A comparative study

Dosyalar

Koleksiyon