Alireza Keshtkar

,
Ali-Asghar Hayat

,
Farnaz Atighi

,
Nazanin Ayare

,
Mohammadreza Keshtkar

,
Parsa Yazdanpanahi

,
Erfan Sadeghi

,
Noushin Deilami

,
Hamid Reihani

,
Alireza Karimi

,
Hamidreza Mokhtari

,
Mohammad Hashem Hashempur
Research Center for Traditional Medicine and History of Medicine, Department of Persian Medicine, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran , hashempur@gmail.com
Abstract: (165 Views)
Background: A 175 billion parameter transformer architecture is used by OpenAI's ChatGPT language model to perform tasks requiring natural language processing. This study aims to evaluate the knowledge and interpretive abilities of ChatGPT on three types of Iranian medical license exams: basic sciences, pre-internship, and pre-residency.
Methods: This comparative study involved administering three different levels of Iran's medical license exams, which included basic sciences, pre-internship, and pre-residency, to ChatGPT 3.5. Two versions of each exam were used, corresponding to the ChatGPT 3.5's internet access time: one during the access time and one after. These exams were inputted to ChatGPT in Persian and English. The accuracy and concordance of each question were extracted by two blinded adjudicators.
Results: A total of 2210 questions, including 667 basic sciences, 763 pre-internship, and 780 pre-residency questions, were presented to ChatGPT in both English and Persian languages. Across all tests, the overall accuracy was found to be 48.5%, with an overall concordance of 91%. Notably, English questions exhibited higher accuracy and concordance rates, with 61.4% accuracy and 94.5% concordance, compared to 35.7% accuracy and 88.7% concordance for Persian questions.
Conclusion: Our findings demonstrate that ChatGPT performs above the required passing scores on basic sciences and pre-internship exams. Moreover, ChatGPT could obtain the minimal score needed to apply for residency positions in Iran; however, it was lower than the applicants' mean scores. Significantly, the model showcases its ability to provide reasoning and contextual information in the majority of responses. These results provide compelling evidence for the potential use of ChatGPT in medical education.