Physician Reasoning on Management Cases With Large Language Models
- Conditions
- Clinical Decision-making
- Interventions
- Other: GPT-4
- Registration Number
- NCT06208423
- Lead Sponsor
- Stanford University
- Brief Summary
This study will evaluate the effect of providing access to GPT-4, a large language model, compared to traditional management decision support tools on performance on case-based management reasoning tasks.
- Detailed Description
Artificial intelligence (AI) technologies, specifically advanced large language models like OpenAI's ChatGPT, have the potential to improve medical decision-making. Although ChatGPT-4 was not developed for its use in medical-specific applications, it has demonstrated promise in various healthcare contexts, including medical note-writing, addressing patient inquiries, and facilitating medical consultation. However, little is known about how ChatGPT augments the clinical reasoning abilities of clinicians.
Clinical reasoning is a complex process involving pattern recognition, knowledge application, and probabilistic reasoning. Integrating AI tools like ChatGPT-4 into physician workflows could potentially help reduce clinician workload and decrease the likelihood of mismanagement. However, ChatGPT-4 was not developed for clinical reasoning nor has it been validated for this purpose. Further, it may be subject to disinformation, including convincing confabulations that may mislead clinicians. If clinicians misuse this tool, it may not improve reasoning and could even cause harm. Therefore, it is important to study how clinicians use large language models to augment clinical reasoning prior to routine incorporation into patient care.
In this study, participants will be randomized to answer clinical management cases with or without access to ChatGPT-4. Each case has multiple components, and the participants will be asked to discuss their reasoning for each component. Answers will be graded by independent reviewers blinded to treatment assignment. A grading rubric was developed for each case by a panel of 4-7 expert discussants. Discussants independently developed a rubric for each case, and then any discrepancies were resolved through multiple rounds of discussions.
Recruitment & Eligibility
- Status
- RECRUITING
- Sex
- All
- Target Recruitment
- 50
- Participants must be licensed physicians and have completed at least post-graduate year 2 (PGY2) of medical training.
- Training in Internal medicine, family medicine, or emergency medicine.
- Not currently practicing clinically.
Study & Design
- Study Type
- INTERVENTIONAL
- Study Design
- PARALLEL
- Arm && Interventions
Group Intervention Description GPT-4 GPT-4 Group will be given access to GPT-4
- Primary Outcome Measures
Name Time Method Management Reasoning Within one-hour study Percent correct (range: 0 to 100) for each case.
- Secondary Outcome Measures
Name Time Method Time Spent on Management Within one-hour study Time (in minutes) participants spend per case between the two study arms.
Trial Locations
- Locations (1)
Stanford University
🇺🇸Palo Alto, California, United States