The Impact of Large Language Models on Diagnostic Reasoning Among LLM-Trained Medical Doctors
- Conditions
- Diagnosis
- Registration Number
- NCT06774612
- Lead Sponsor
- Lahore University of Management Sciences
- Brief Summary
This study aims to evaluate whether large language model-trained medical doctors demonstrate enhanced diagnostic reasoning performance when utilizing ChatGPT-4o alongside conventional resources compared to using conventional resources alone.
- Detailed Description
Diagnostic errors are a major source of preventable patient harm. Recent advances in Large Language Models (LLM), particularly ChatGPT-4o, have shown promise in enhancing medical decision-making. However, little is known about their impact on medical doctors' (e.g., physicians' and surgeons') diagnostic reasoning.
Diagnostic accuracy relies on complex clinical reasoning and careful evaluation of patient data. While AI assistance could potentially reduce errors and improve efficiency, ChatGPT-4o lacks medical validation and could introduce new risks through incorrect information generation (also known as hallucinations). To mitigate these risks, doctors need adequate training in understanding ChatGPT-4o's capabilities, limitations, and proper usage. Given these uncertainties and the importance of proper AI training, systematic evaluation is essential before clinical implementation.
This randomized study will assess whether ChatGPT-4o access improves LLM-trained medical doctors' diagnostic performance compared to conventional resources (e.g., textbooks, online medical databases) alone. All participating doctors will have completed at least a 10-hour training program covering ChatGPT-4o usage, prompt engineering techniques, and output evaluation strategies. Participants will provide differential diagnoses with supporting evidence and recommended next steps for clinical cases, with responses evaluated by blinded reviewers.
Recruitment & Eligibility
- Status
- COMPLETED
- Sex
- All
- Target Recruitment
- 60
- Full or Provisionally Registered Medical Practitioners with the Pakistan Medical and Dental Council (PMDC).
- Completed Bachelor of Medicine, Bachelor of Surgery (MBBS) Exam. The equivalent degree of MBBS in US and Canada is called Doctor of Medicine (MD).
- Participants must have completed a structured training program on the use of ChatGPT (or a comparable large language model), totaling at least 10 hours of instruction. The program must include hands-on practice related to LLM's aspects, specifically prompt engineering and content evaluation.
- Any other Registered Medical Practitioners (Full or Provisional) with PMDC (e.g., Professionals with Bachelor of Dental Surgery or BDS).
Study & Design
- Study Type
- INTERVENTIONAL
- Study Design
- PARALLEL
- Primary Outcome Measures
Name Time Method Diagnostic reasoning Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment. The primary outcome will be the percent correct for each case (range: 0 to 100). For each case, participants will be asked for three top diagnoses, findings from the case that support that diagnosis, and findings from the case that oppose that diagnosis. For each plausible diagnosis, participants will receive 1 point. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for partially correct and 2 points for completely correct responses. Participants will then be asked to name their top diagnosis, earning one point for a reasonable response and two points for the most correct response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with one point awarded for a partially correct response and two points for a completely correct response. The primary outcome will be compared on the case-level by the randomized groups.
- Secondary Outcome Measures
Name Time Method Time Spent on Diagnosis Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment. We will compare how much time (in seconds) participants spend per case between the two study arms.
Related Research Topics
Explore scientific publications, clinical data analysis, treatment approaches, and expert-compiled information related to the mechanisms and outcomes of this trial. Click any topic for comprehensive research insights.
Trial Locations
- Locations (1)
Lahore University of Management Sciences
🇵🇰Lahore, Punjab, Pakistan
Lahore University of Management Sciences🇵🇰Lahore, Punjab, Pakistan