The Impact of Large Language Models on Diagnostic Reasoning Among LLM-Trained Medical Doctors

Not Applicable

Completed

Conditions: Diagnosis

Registration Number: NCT06774612

Lead Sponsor: Lahore University of Management Sciences

Brief Summary: This study aims to evaluate whether large language model-trained medical doctors demonstrate enhanced diagnostic reasoning performance when utilizing ChatGPT-4o alongside conventional resources compared to using conventional resources alone.

Detailed Description: Diagnostic errors are a major source of preventable patient harm. Recent advances in Large Language Models (LLM), particularly ChatGPT-4o, have shown promise in enhancing medical decision-making. However, little is known about their impact on medical doctors' (e.g., physicians' and surgeons') diagnostic reasoning.

Diagnostic accuracy relies on complex clinical reasoning and careful evaluation of patient data. While AI assistance could potentially reduce errors and improve efficiency, ChatGPT-4o lacks medical validation and could introduce new risks through incorrect information generation (also known as hallucinations). To mitigate these risks, doctors need adequate training in understanding ChatGPT-4o's capabilities, limitations, and proper usage. Given these uncertainties and the importance of proper AI training, systematic evaluation is essential before clinical implementation.

This randomized study will assess whether ChatGPT-4o access improves LLM-trained medical doctors' diagnostic performance compared to conventional resources (e.g., textbooks, online medical databases) alone. All participating doctors will have completed at least a 10-hour training program covering ChatGPT-4o usage, prompt engineering techniques, and output evaluation strategies. Participants will provide differential diagnoses with supporting evidence and recommended next steps for clinical cases, with responses evaluated by blinded reviewers.

Recruitment & Eligibility

Status: COMPLETED

Sex: All

Target Recruitment: 60

Inclusion Criteria

Full or Provisionally Registered Medical Practitioners with the Pakistan Medical and Dental Council (PMDC).
Completed Bachelor of Medicine, Bachelor of Surgery (MBBS) Exam. The equivalent degree of MBBS in US and Canada is called Doctor of Medicine (MD).
Participants must have completed a structured training program on the use of ChatGPT (or a comparable large language model), totaling at least 10 hours of instruction. The program must include hands-on practice related to LLM's aspects, specifically prompt engineering and content evaluation.

Exclusion Criteria

Any other Registered Medical Practitioners (Full or Provisional) with PMDC (e.g., Professionals with Bachelor of Dental Surgery or BDS).

Study & Design

Study Type: INTERVENTIONAL

Study Design: PARALLEL

Primary Outcome Measures

Name	Time	Method
Diagnostic reasoning	Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.	The primary outcome will be the percent correct for each case (range: 0 to 100). For each case, participants will be asked for three top diagnoses, findings from the case that support that diagnosis, and findings from the case that oppose that diagnosis. For each plausible diagnosis, participants will receive 1 point. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for partially correct and 2 points for completely correct responses. Participants will then be asked to name their top diagnosis, earning one point for a reasonable response and two points for the most correct response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with one point awarded for a partially correct response and two points for a completely correct response. The primary outcome will be compared on the case-level by the randomized groups.

Secondary Outcome Measures

Name	Time	Method
Time Spent on Diagnosis	Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.	We will compare how much time (in seconds) participants spend per case between the two study arms.

Trial Locations

Locations (1): Lahore University of Management Sciences
🇵🇰
Lahore, Punjab, Pakistan
Lahore University of Management Sciences
🇵🇰Lahore, Punjab, Pakistan

Related Trials

Evaluating the Potential of Large Language Models for Respiratory Disease Consultations

CompletedNot Applicable

North Sichuan Medical College

Posted 6/13/2024

Updated 11/27/2024

Physician Reasoning on Diagnostic Cases With Large Language Models

CompletedNot Applicable

Stanford University

Posted 12/6/2023

Updated 2/20/2024

Efficacy of Using Large Language Model to Assist in Diabetic Retinopathy Detection

CompletedNot Applicable

Sun Yat-sen University

Posted 2/9/2022

Updated 1/19/2024

Artificial Intelligent Clinical Decision Support System Simulation Center Study for Technology Acceptance

CompletedNot Applicable

Yale University

Posted 4/18/2023

Updated 3/10/2025

Application of Multimodal Large Language Model in HFpEF

Recruiting

Peking University Third Hospital

Posted 7/3/2024

The Application of Large Language Model in Emergency Chest Pain Triage

RecruitingNot Applicable

Peking University Third Hospital

Posted 7/9/2024

Free Text Prediction Algorithm for Appendicitis

Completed

National University Hospital, Singapore

Posted 1/30/2018

Updated 3/3/2021

Effect of Large Language Model in Assisting Discharge Summary Notes Writing for Hospitalized Patients

Enrolling by InvitationNot Applicable

Mayo Clinic

Posted 2/16/2024

Updated 1/24/2025

Treatment Recommendations for Gastrointestinal Cancers Via Large Language Models

RecruitingNot Applicable

Chinese Academy of Sciences

Posted 8/21/2023

Updated 9/8/2023

DISCOVERY: Diagnostic Data and Genetic Polymorphisms in ICD Patients.

CompletedNot Applicable

Medtronic Cardiac Rhythm and Heart Failure

Posted 5/25/2007

Updated 7/2/2025

AI-Powered Research

Premium Access

The Impact of Large Language Models on Diagnostic Reasoning Among LLM-Trained Medical Doctors

Recruitment & Eligibility

Study & Design

Related Research Topics

Trial Locations

Related Trials

Evaluating the Potential of Large Language Models for Respiratory Disease Consultations

Physician Reasoning on Diagnostic Cases With Large Language Models

Efficacy of Using Large Language Model to Assist in Diabetic Retinopathy Detection

Artificial Intelligent Clinical Decision Support System Simulation Center Study for Technology Acceptance

Application of Multimodal Large Language Model in HFpEF

The Application of Large Language Model in Emergency Chest Pain Triage

Free Text Prediction Algorithm for Appendicitis

Effect of Large Language Model in Assisting Discharge Summary Notes Writing for Hospitalized Patients

Treatment Recommendations for Gastrointestinal Cancers Via Large Language Models

DISCOVERY: Diagnostic Data and Genetic Polymorphisms in ICD Patients.

Clinical Trial Alerts

MedPath

Product

Company

Legal