MedPath

Physician Reasoning on Diagnostic Cases With Large Language Models

Not Applicable
Completed
Conditions
Diagnosis
Interventions
Other: GPT-4
Registration Number
NCT06157944
Lead Sponsor
Stanford University
Brief Summary

This study will evaluate the effect of providing access to GPT-4, a large language model, compared to traditional diagnostic decision support tools on performance on case-based diagnostic reasoning tasks.

Detailed Description

Artificial intelligence (AI) technologies, specifically advanced large language models like OpenAI's ChatGPT, have the potential to improve medical decision-making. Although ChatGPT-4 was not developed for its use in medical-specific applications, it has demonstrated promise in various healthcare contexts, including medical note-writing, addressing patient inquiries, and facilitating medical consultation. However, little is known about how ChatGPT augments the clinical reasoning abilities of clinicians.

Clinical reasoning is a complex process involving pattern recognition, knowledge application, and probabilistic reasoning. Integrating AI tools like ChatGPT-4 into physician workflows could potentially help reduce clinician workload and decrease the likelihood of missed diagnoses. However, ChatGPT-4 was not developed for the purpose of clinical reasoning nor has it been validated for this purpose. Further, it may be subject to disinformation, including convincing confabulations that may mislead clinicians. If clinicians misuse this tool, it may not improve diagnostic reasoning and could even cause harm. Therefore, it is important to study how clinicians use large language models to augment clinical reasoning prior to routine incorporation into patient care.

In this study, we will randomize participants to answer diagnostic cases with or without access to ChatGPT-4. The participants will be asked to give three differential diagnoses for each case, with supporting and opposing findings for each diagnosis. Additionally they will be asked to provide their top diagnosis along with next diagnostic steps. Answers will be graded by independent reviewers blinded to treatment assignment.

Recruitment & Eligibility

Status
COMPLETED
Sex
All
Target Recruitment
50
Inclusion Criteria
  • Participants must be licensed physicians and have completed at least post-graduate year 2 (PGY2) of medical training.
  • Training in Internal medicine, family medicine, or emergency medicine.
Exclusion Criteria
  • Not currently practicing clinically.

Study & Design

Study Type
INTERVENTIONAL
Study Design
PARALLEL
Arm && Interventions
GroupInterventionDescription
GPT-4GPT-4Group will be given access to GPT-4.
Primary Outcome Measures
NameTimeMethod
Diagnostic reasoningDuring evaluation

The primary outcome will be the percent correct (range: 0 to 100) for each case. For each case, participants will be asked for three top diagnoses and findings from the case that support that diagnosis and oppose that diagnosis. Participants will receive 1 point for each plausible diagnosis. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for partially correct and 2 points for completely correct responses. Participants will then be asked to name their top diagnosis, earning one point for a reasonable response and two points for the most correct response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with one point awarded for a partially correct response and two points for a completely correct response. The primary outcome will be compared on the case-level by the randomized groups.

Secondary Outcome Measures
NameTimeMethod
Time Spent on DiagnosisDuring evaluation

We will compare how much time (in minutes) participants spend per case between the two study arms.

Trial Locations

Locations (1)

Stanford University

🇺🇸

Palo Alto, California, United States

© Copyright 2025. All Rights Reserved by MedPath