• A randomized clinical trial assessed whether large language models (LLMs) improve diagnostic reasoning among family, internal, and emergency medicine physicians.
• The study found that LLM use did not significantly enhance diagnostic reasoning compared to conventional resources alone, with a non-significant 2% score difference.
• The LLM alone outperformed both physician groups, scoring 16% higher than those using conventional resources, highlighting potential for AI in diagnostics.
• Further development is needed to effectively integrate LLMs into clinical practice to improve physician-AI collaboration and diagnostic accuracy.