ChatGPT's Performance on Iran's Medical Licensing Exams

Keshtkar, Alireza; Hayat, Ali-Asghar; Atighi, Farnaz; Ayare, Nazanin; Keshtkar, Mohammadreza; Yazdanpanahi, Parsa; Sadeghi, Erfan; Deilami, Noushin; Reihani, Hamid; Karimi, Alireza; Mokhtari, Hamidreza; Hashempur, Mohammad Hashem

doi:10.47176/mjiri.39.24

Volume 39, Issue 1 (1-2025) Med J Islam Repub Iran 2025 | Back to browse issues page

‎ 10.47176/mjiri.39.24

Mendeley

Zotero

RefWorks

Keshtkar A, Hayat A, Atighi F, Ayare N, Keshtkar M, Yazdanpanahi P, et al . ChatGPT's Performance on Iran's Medical Licensing Exams. Med J Islam Repub Iran 2025; 39 (1) :190-195
URL: http://mjiri.iums.ac.ir/article-1-9313-en.html

ChatGPT's Performance on Iran's Medical Licensing Exams

Alireza Keshtkar

, Ali-Asghar Hayat

, Farnaz Atighi

, Nazanin Ayare

, Mohammadreza Keshtkar

, Mohammad Hashem Hashempur

Research Center for Traditional Medicine and History of Medicine, Department of Persian Medicine, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran , hashempur@gmail.com

Abstract: (1520 Views)

Background: A 175 billion parameter transformer architecture is used by OpenAI's ChatGPT language model to perform tasks requiring natural language processing. This study aims to evaluate the knowledge and interpretive abilities of ChatGPT on three types of Iranian medical license exams: basic sciences, pre-internship, and pre-residency.
   Methods: This comparative study involved administering three different levels of Iran's medical license exams, which included basic sciences, pre-internship, and pre-residency, to ChatGPT 3.5. Two versions of each exam were used, corresponding to the ChatGPT 3.5's internet access time: one during the access time and one after. These exams were inputted to ChatGPT in Persian and English. The accuracy and concordance of each question were extracted by two blinded adjudicators.
   Results: A total of 2210 questions, including 667 basic sciences, 763 pre-internship, and 780 pre-residency questions, were presented to ChatGPT in both English and Persian languages. Across all tests, the overall accuracy was found to be 48.5%, with an overall concordance of 91%. Notably, English questions exhibited higher accuracy and concordance rates, with 61.4% accuracy and 94.5% concordance, compared to 35.7% accuracy and 88.7% concordance for Persian questions.
   Conclusion: Our findings demonstrate that ChatGPT performs above the required passing scores on basic sciences and pre-internship exams. Moreover, ChatGPT could obtain the minimal score needed to apply for residency positions in Iran; however, it was lower than the applicants' mean scores. Significantly, the model showcases its ability to provide reasoning and contextual information in the majority of responses. These results provide compelling evidence for the potential use of ChatGPT in medical education.

Keywords: Medical education, Chat GPT, Artificial intelligence, Iran

Full-Text [PDF 446 kb] (620 Downloads)

Type of Study: Original Research | Subject: Medical Education

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Medical Journal of The Islamic Republic of Iran (MJIRI)

Iran University of Medical Sciences