Assessment of ChatGPT-4 in family medicine board examinations using advanced ai learning and analytical methods: observational study

dc.contributor.authorGoodings, Anthony Jamesen
dc.contributor.authorKajitani, Stenen
dc.contributor.authorChhor, Allisonen
dc.contributor.authorAlbakri, Ahmaden
dc.contributor.authorPastrak, Milaen
dc.contributor.authorKodancha, Meghaen
dc.contributor.authorRowan, Ivesen
dc.contributor.authorLee, Yoo Binen
dc.contributor.authorKajitani, Karien
dc.date.accessioned2024-11-11T16:51:44Z
dc.date.available2024-11-11T16:51:44Z
dc.date.issued2024en
dc.description.abstractBackground: This research explores the capabilities of ChatGPT-4 in passing the American Board of Family Medicine (ABFM) Certification Examination. Addressing a gap in existing literature, where earlier artificial intelligence (AI) models showed limitations in medical board examinations, this study evaluates the enhanced features and potential of ChatGPT-4, especially in document analysis and information synthesis. Objective: The primary goal is to assess whether ChatGPT-4, when provided with extensive preparation resources and when using sophisticated data analysis, can achieve a score equal to or above the passing threshold for the Family Medicine Board Examinations. Methods: In this study, ChatGPT-4 was embedded in a specialized subenvironment, “AI Family Medicine Board Exam Taker,” designed to closely mimic the conditions of the ABFM Certification Examination. This subenvironment enabled the AI to access and analyze a range of relevant study materials, including a primary medical textbook and supplementary web-based resources. The AI was presented with a series of ABFM-type examination questions, reflecting the breadth and complexity typical of the examination. Emphasis was placed on assessing the AI’s ability to interpret and respond to these questions accurately, leveraging its advanced data processing and analysis capabilities within this controlled subenvironment. Results: In our study, ChatGPT-4’s performance was quantitatively assessed on 300 practice ABFM examination questions. The AI achieved a correct response rate of 88.67% (95% CI 85.08%-92.25%) for the Custom Robot version and 87.33% (95% CI 83.57%-91.10%) for the Regular version. Statistical analysis, including the McNemar test (P=.45), indicated no significant difference in accuracy between the 2 versions. In addition, the chi-square test for error-type distribution (P=.32) revealed no significant variation in the pattern of errors across versions. These results highlight ChatGPT-4’s capacity for high-level performance and consistency in responding to complex medical examination questions under controlled conditions. Conclusions: The study demonstrates that ChatGPT-4, particularly when equipped with specialized preparation and when operating in a tailored subenvironment, shows promising potential in handling the intricacies of medical board examinations. While its performance is comparable with the expected standards for passing the ABFM Certification Examination, further enhancements in AI technology and tailored training methods could push these capabilities to new heights. This exploration opens avenues for integrating AI tools such as ChatGPT-4 in medical education and assessment, emphasizing the importance of continuous advancement and specialized training in medical applications of AI.en
dc.description.statusPeer revieweden
dc.description.versionPublished Versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.articleide56128–e56128en
dc.identifier.citationGoodings, A.J., Kajitani, S., Chhor, A., Albakri, A., Pastrak, M., Kodancha, M., Ives, R., Lee, Y.B. and Kajitani, K. (2024) ‘Assessment of ChatGPT-4 in family medicine board examinations using advanced ai learning and analytical methods: observational study’, JMIR Medical Education, 10, e56128–e56128 (8pp). https://doi:10.2196/56128en
dc.identifier.doi doi:10.2196/56128 en
dc.identifier.endpage8en
dc.identifier.journaltitleJMIR Medical Educationen
dc.identifier.startpage1en
dc.identifier.urihttps://hdl.handle.net/10468/16638
dc.identifier.volume10en
dc.language.isoenen
dc.publisherJMIR Publicationsen
dc.rights© Anthony James Goodings, Sten Kajitani, Allison Chhor, Ahmad Albakri, Mila Pastrak, Megha Kodancha, Rowan Ives, Yoo Bin Lee, Kari Kajitani. Originally published in JMIR Medical Education (https://mededu.jmir.org), 8.10.2024. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Education, is properly cited.en
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en
dc.subjectChatGPT-4en
dc.subjectFamily Medicine Board Examinationen
dc.subjectArtificial intelligence in medical educationen
dc.subjectAI performance assessmenten
dc.subjectPrompt engineeringen
dc.subjectChatGPTen
dc.subjectArtificial intelligenceen
dc.subjectAIen
dc.subjectMedical educationen
dc.subjectAssessmenten
dc.subjectObservationalen
dc.subjectAnalytical methoden
dc.subjectData analysisen
dc.subjectExaminationen
dc.titleAssessment of ChatGPT-4 in family medicine board examinations using advanced ai learning and analytical methods: observational studyen
dc.typeArticle (peer-reviewed)en
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
mededu-2024-1-e56128.pdf
Size:
378.79 KB
Format:
Adobe Portable Document Format
Description:
Published Version
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.71 KB
Format:
Item-specific license agreed upon to submission
Description: