Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Assessing the validity of ChatGPT-4o and Google Gemini Advanced when responding to frequently asked questions in endodontics

Dufey-Portilla, Nicolás, Frisman, Ana Billik, Robles, Maximiliano Gallardo, Peña-Bengoa, Fernando, Ávila, Consuelo Cabrera, Nagendrababu, Venkateshbabu, Dummer, Paul M. H. ORCID: https://orcid.org/0000-0002-0726-7467, Garcia-Font, Marc and Sans, Francesc Abella 2025. Assessing the validity of ChatGPT-4o and Google Gemini Advanced when responding to frequently asked questions in endodontics. Journal of Applied Oral Science 33 , e20250321. 10.1590/1678-7757-2025-0321

[thumbnail of download.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (753kB)
License URL: http://creativecommons.org/licenses/by/4.0/
License Start date: 1 January 2025

Abstract

Artificial intelligence (AI) is transforming access to dental information via large language models (LLMs) such as ChatGPT and Google Gemini. Both models are increasingly being used in endodontics as a source of information for patients. Therefore, as developers release new versions, the validity of their responses must be continuously compared to professional consultations. Objective: This study aimed to evaluate the validity of the responses provided by the most advanced LLMs [Google Gemini Advanced (GGA) and ChatGPT-4o] to frequently asked questions (FAQs) in endodontics. Methodology: A cross-sectional analytical study was conducted in five phases. The top 20 endodontic FAQs submitted by users to chatbots and collected from Google Trends were compiled. In total, nine academically certified endodontic specialists with educational roles scored GGA and ChatGPT-4o responses to the FAQs using a five-point Likert scale. Validity was determined using high (4.5-5) and low (≥4) thresholds. The Fisher's exact test was used for comparative analysis. Results: At the low threshold, both models obtained 95% validity (95% CI: 75.1%- 99.9%; p=.05). At the high threshold, ChatGPT-4o achieved 35% (95% CI: 15.4%- 59.2%) and GGA, 40% (95% CI: 19.1%- 63.9%) validity (p=1). Conclusions: ChatGPT-4o and GGA responses showed high validity under lenient criteria that significantly decreased under stricter thresholds, limiting their reliability as a stand-alone source of information in endodontics. While AI chatbots show promise to improve patient education in endodontics, their validity limitations under rigorous evaluation highlight the need for careful professional monitoring.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Schools > Dentistry
Additional Information: License information from Publisher: LICENSE 1: URL: http://creativecommons.org/licenses/by/4.0/, Start Date: 2025-01-01
Publisher: Universidade de São Paulo
ISSN: 1678-7757
Date of First Compliant Deposit: 13 October 2025
Date of Acceptance: 25 June 2025
Last Modified: 13 Oct 2025 11:30
URI: https://orca.cardiff.ac.uk/id/eprint/181609

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics