LLMs Show Limited Benefit in Enhancing Physician Diagnostic Reasoning: A Randomized Trial

A recent randomized clinical trial published in JAMA Network Open investigated the impact of large language models (LLMs) on physician diagnostic reasoning. The study, conducted across multiple academic medical institutions, found that providing physicians with access to an LLM did not significantly improve their diagnostic accuracy compared to using conventional resources. However, the LLM alone outperformed both physician groups, suggesting potential for AI in clinical decision support with further development.

The trial, which involved 50 physicians from family medicine, internal medicine, and emergency medicine, assessed diagnostic reasoning performance using a standardized rubric. Participants were randomized to either access an LLM (ChatGPT Plus) in addition to conventional diagnostic resources or conventional resources only. They were given 60 minutes to review up to six clinical vignettes, with their diagnostic performance evaluated based on differential diagnosis accuracy, appropriateness of supporting and opposing factors, and next diagnostic evaluation steps.

The primary outcome, the median diagnostic reasoning score per case, was 76% for the LLM group and 74% for the conventional resources-only group. The adjusted difference of 2 percentage points (95% CI, -4 to 8 percentage points; P = .60) was not statistically significant. Similarly, the median time spent per case was 519 seconds for the LLM group and 565 seconds for the conventional resources group, with a non-significant time difference of -82 seconds (95% CI, -195 to 31 seconds; P = .20).

LLM Standalone Performance

Interestingly, when the LLM was used alone to answer the cases, it scored 16 percentage points (95% CI, 2-30 percentage points; P = .03) higher than the conventional resources group. This suggests that the LLM has the potential to enhance diagnostic accuracy, but its integration into clinical practice requires further refinement.

Implications for Clinical Practice

The study's findings have significant implications for the integration of AI into clinical practice. While LLMs have shown promise in medical reasoning examinations, their effectiveness in improving physician diagnostic reasoning remains uncertain. The results suggest that simply providing access to LLMs may not be sufficient to improve overall physician diagnostic reasoning.

Ethan Goh, MD, MS, the corresponding author of the study, noted that the results highlight the need for technology and workforce development to realize the potential of physician-artificial intelligence collaboration in clinical practice. He suggested that training clinicians in best prompting practices may improve physician performance with LLMs. Alternatively, organizations could invest in predefined prompting for diagnostic decision support integrated into clinical workflows and documentation, enabling synergy between the tools and clinicians.

Structured Reflection as an Assessment Tool

The study also developed a measure based on structured reflection to evaluate diagnostic reasoning skills. This assessment tool demonstrated substantial agreement between graders and internal reliability, advancing the field beyond early LLM research that focused on benchmarks with limited clinical utility.

Limitations

The authors acknowledged several limitations of the study, including the focus on a single LLM and the lack of explicit training in prompt engineering techniques for participants. Additionally, the study used a limited number of clinical vignettes, which may not comprehensively cover the variety of cases in the field of medicine.

Conclusion

Despite these limitations, the study provides valuable insights into the potential and challenges of using LLMs in clinical practice. While LLMs alone may outperform physicians in diagnostic reasoning, their integration into clinical workflows requires further development to enhance physician-AI collaboration and improve diagnostic accuracy.

AI-Powered Research

Premium Access

LLMs Show Limited Benefit in Enhancing Physician Diagnostic Reasoning: A Randomized Trial

Key Insights

LLM Standalone Performance

Implications for Clinical Practice

Structured Reflection as an Assessment Tool

Limitations

Conclusion

Stay Updated with Our Daily Newsletter

Clinical Trials

Highlighted Clinical Trials

Physician Reasoning on Diagnostic Cases With Large Language Models

Related News

DOL Clarifies FMLA Coverage for Clinical Trial Participants with Serious Health Conditions

DOL Clarifies FMLA Coverage for Clinical Trial Participants

DOL Clarifies FMLA Coverage for Clinical Trial Participation

DOL Clarifies FMLA Coverage for Clinical Trial Participation

TESLA Trial: Thrombectomy Fails to Improve Outcomes in Large Core Stroke Patients

Study Reveals Lack of Clinical Validation for Nearly Half of FDA-Authorized AI Medical Devices

Digital Medicine Support Models Tested for Alcohol Use Disorder

MRD Assessment in Myeloma Trials Shows Increasing Use but Lacks Standardization

NIH-Sponsored Trials Validate Severity of Persistent Lyme Disease Symptoms

Lyme Disease Treatment Trials: Generalizability Concerns Highlighted

Sources

Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial

GSK Secures $370 Million Settlement and Future Royalties in CureVac-BioNTech mRNA Patent Dispute

CorMedix Acquires Melinta Therapeutics for $300 Million to Expand Anti-Infective Portfolio

Orchestra BioMed Expands BACKBEAT Hypertension Trial Eligibility 24-Fold Following FDA Protocol Approval

HOPE Therapeutics Receives Florida Regulatory Approval for Dura Medical Acquisition to Expand Precision Psychiatry Network

Dewpoint Therapeutics Cuts 70% of Workforce, Refocuses on Lead Cancer Drug DPTx3186

Cosmo Pharmaceuticals and Takeda Renew Multi-Year Manufacturing Agreement for Ulcerative Colitis Treatment

Iovance Biotherapeutics Implements Major Restructuring Amid Regulatory Setbacks and Commercial Challenges

Radella Pharmaceuticals' MD-18 Demonstrates 2.7% Weight Loss and Cardiometabolic Benefits in Phase 1b Trial

Trinity Biotech Launches FDA-Cleared Preeclampsia Testing Service with Potential $10 Million Cost Savings

Iovance Biotherapeutics Withdraws Amtagvi EU Filing, Stock Plunges 29% Despite Strong Q2 Sales

MedPath

Product

Company

Legal