Evaluation of ChatGPT performance on emergency medicine board examination questions: observational study

Pastrak, Mila; Kajitani, Sten; Goodings, Anthony James; Drewek, Austin; LaFree, Andrew; Murphy, Adrian

Evaluation of ChatGPT performance on emergency medicine board examination questions: observational study

dc.contributor.author	Pastrak, Mila	en
dc.contributor.author	Kajitani, Sten	en
dc.contributor.author	Goodings, Anthony James	en
dc.contributor.author	Drewek, Austin	en
dc.contributor.author	LaFree, Andrew	en
dc.contributor.author	Murphy, Adrian	en
dc.date.accessioned	2025-03-13T14:45:38Z
dc.date.available	2025-03-13T14:45:38Z
dc.date.issued	2025	en
dc.description.abstract	Background: The ever-evolving field of medicine has highlighted the potential for ChatGPT as an assistive platform. However, its use in medical board examination preparation and completion remains unclear. Objective: This study aimed to evaluate the performance of a custom-modified version of ChatGPT-4, tailored with emergency medicine board examination preparatory materials (Anki flashcard deck), compared to its default version and previous iteration (3.5). The goal was to assess the accuracy of ChatGPT-4 answering board-style questions and its suitability as a tool to aid students and trainees in standardized examination preparation. Methods: A comparative analysis was conducted using a random selection of 598 questions from the Rosh In-Training Examination Question Bank. The subjects of the study included three versions of ChatGPT: the Default, a Custom, and ChatGPT-3.5. The accuracy, response length, medical discipline subgroups, and underlying causes of error were analyzed. Results: The Custom version did not demonstrate a significant improvement in accuracy over the Default version (P=.61), although both significantly outperformed ChatGPT-3.5 (P<.001). The Default version produced significantly longer responses than the Custom version, with the mean (SD) values being 1371 (444) and 929 (408), respectively (P<.001). Subgroup analysis revealed no significant difference in the performance across different medical subdisciplines between the versions (P>.05 in all cases). Both the versions of ChatGPT-4 had similar underlying error types (P>.05 in all cases) and had a 99% predicted probability of passing while ChatGPT-3.5 had an 85% probability. Conclusions: The findings suggest that while newer versions of ChatGPT exhibit improved performance in emergency medicine board examination preparation, specific enhancement with a comprehensive Anki flashcard deck on the topic does not significantly impact accuracy. The study highlights the potential of ChatGPT-4 as a tool for medical education, capable of providing accurate support across a wide range of topics in emergency medicine in its default form.	en
dc.description.status	Peer reviewed	en
dc.description.version	Published Version	en
dc.format.mimetype	application/pdf	en
dc.identifier.articleid	e67696	en
dc.identifier.citation	Pastrak, M., Kajitani, S., Goodings, A.J., Drewek, A., LaFree, A. and Murphy, A. (2025) ‘Evaluation of chatgpt performance on emergency medicine board examination questions: observational study’, JMIR AI, 4, pp. e67696 (9pp). https://doi.org/10.2196/67696	en
dc.identifier.doi	10.2196/67696	en
dc.identifier.eissn	2817-1705	en
dc.identifier.endpage	9	en
dc.identifier.journaltitle	JMIR AI	en
dc.identifier.startpage	1	en
dc.identifier.uri	https://hdl.handle.net/10468/17173
dc.identifier.volume	4	en
dc.language.iso	en	en
dc.publisher	JMIR Publications	en
dc.rights	© 2025, Mila Pastrak, Sten Kajitani, Anthony James Goodings, Austin Drewek, Andrew LaFree, Adrian Murphy. Originally published in JMIR AI (https://ai.jmir.org). This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR AI, is properly cited. The complete bibliographic information, a link to the original publication on https://www.ai.jmir.org/, as well as this copyright and license information must be included.	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en
dc.subject	Artificial intelligence	en
dc.subject	ChatGPT-4	en
dc.subject	Medical education	en
dc.subject	Emergency medicine	en
dc.subject	Examination	en
dc.subject	Examination preparation	en
dc.title	Evaluation of ChatGPT performance on emergency medicine board examination questions: observational study	en
dc.type	Article (peer-reviewed)	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ai-2025-1-e67696.pdf
Size:: 205.3 KB
Format:: Adobe Portable Document Format
Description:: Published Version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Medicine - Journal Articles