Barw Barw Medical Journal 2960-1959 3 3 2025 04 30 Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study 6 12 10.58742/bmj.v3i3.180 eng Talar Sabir Ahmed Hiwa Cancer Hospital, Shorsh Street, Sulaymaniyah, Iraq. talar.ahmed@gmail.com Rawa M. Ali Hospital for Treatment of Victims of Chemical Weapons, Mawlawy Street, Halabja, Iraq. rawa.ali@gmail.com Ari M. Abdullah Department of Pathology, Sulaymaniyah Teaching Hospital, Sulaymaniyah, Iraq. ariabdullah1978@gmail.com Hadeel A. Yasseen College of Medicine, University of Sulaimani, Madam Mitterrand Street, Sulaymaniyah, Iraq. hadeel.yasseen@gmail.com Ronak S. Ahmed Shahid Nabaz Dermatology Teaching Center for Treating Skin Diseases, Sulaymaniyah Directorate of Health, Sulaymaniyah, Iraq. ronakahmed76@gmail.com Ameer M. Salih Scientific Affairs Department, Smart Health Tower, Madam Mitterrand Street, Sulaymaniyah, Iraq. ameer.salih@univsul.edu.iq Dilan S. Hiwa Scientific Affairs Department, Smart Health Tower, Madam Mitterrand Street, Sulaymaniyah, Iraq. dilan.sarmad.hiwa@gmail.com Shvan H. Mohammed Xzmat polyclinic, Rizgari, Kalar, Sulaymaniyah, Iraq. shvanh80@gmail.com 2025 04 05 Introduction The exact manner in which large language models will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, benefits, biases, and limitations of large language models in diagnosing dermatologic conditions within pathology. Methods A pathologist compiled 60 real histopathology case scenarios of skin conditions from a hospital database. Two other pathologists reviewed each patient’s demographics, clinical details, histopathology findings, and original diagnosis. These cases were presented to ChatGPT-3.5, Gemini, and an external pathologist. Each response was classified as complete agreement, partial agreement, or no agreement with the original pathologist’s diagnosis. Results ChatGPT-3.5 had 29 (48.4%) complete agreements, 14 (23.3%) partial agreements, and 17 (28.3%) none agreements. Gemini showed 20 (33%), 9 (15%), and 31 (52%) complete agreement, partial agreement, and no agreement responses, respectively. Additionally, the external pathologist had 36(60%), 17(28%), and 7(12%) complete agreements, partial agreements, and no agreements responses, respectively, in relation to the pathologists’ diagnosis. Significant differences in diagnostic agreement were found between the LLMs (ChatGPT, Gemini) and the pathologist (P < 0.001). Conclusion In certain instances, ChatGPT-3.5 and Gemini may provide an accurate diagnosis of skin pathologies when presented with relevant patient history and descriptions of histopathological reports. However, their overall performance is insufficient for reliable use in real-life clinical settings.