The field of AI-powered drug discovery is showing early promise, with a recent analysis published in Drug Discovery Today revealing promising results for AI-designed drugs in clinical trials. Meanwhile, the FDA is increasing its scrutiny of unapproved GLP-1 drugs, and a recent trial explored the role of large language models (LLMs) in enhancing physician diagnostic reasoning.
AI-Discovered Drugs Show High Success Rates in Phase I Trials
The Boston Consulting Group analyzed clinical trial outcomes from 75 AI-discovered drugs developed by AI-native biotech companies, many partnered with pharmaceutical giants. The findings are particularly striking in Phase I, where AI-discovered molecules demonstrated an 80–90% success rate. This far exceeds traditional Phase I averages of 40–65%, suggesting AI models are adept at identifying drug candidates with favorable safety and pharmacokinetic profiles.
In Phase II, where proof-of-concept and efficacy are tested, success rates drop to 40%, aligning closely with historical averages. The study notes that the "hard problem" of translating AI-predicted biology into clinical efficacy remains. However, a deeper dive reveals that only a fraction of Phase II failures were due to negative outcomes—many were discontinued for business or operational reasons, reflecting broader economic challenges associated with biotech.
The success of AI in Phase I trials could stem from its ability to optimize drug-likeness, predict safety, and explore novel chemical space, reducing early failures. Notably, AI-driven approaches are diversifying. While AI-repurposed drugs once dominated, AI-discovered small molecules, vaccines, and biologics are gaining ground, with oncology leading as the most common therapeutic focus.
The authors propose an optimistic thought experiment: if AI’s early success rates hold and Phase III remains on par with traditional averages, end-to-end clinical trial success rates could nearly double from 5–10% to 9–18%. This would significantly enhance R&D productivity, enabling faster, cheaper delivery of innovative medicines to patients.
FDA Cracks Down on Unapproved GLP-1 Drugs
GLP-1 receptor agonists have soared in popularity due to their effectiveness in weight loss and diabetes management. This surge in demand has caused significant shortages, prompting compounding pharmacies to step in and fill the gap. However, the FDA has raised serious concerns about the growing availability of unapproved and compounded versions of these drugs, including semaglutide and tirzepatide.
Unlike FDA-approved medications, compounded drugs are custom-made by mixing ingredients to address specific patient needs or general shortages. However, they bypass the FDA's safety and efficacy evaluations, leading to reports of adverse effects, including dosing errors and hospitalizations. These compounded versions may also contain inconsistent dosages or unverified ingredients, increasing the risks to patients.
Beyond compounding pharmacies, even less scrupulous vendors have entered the market, selling counterfeit GLP-1 products. The FDA is actively cracking down on these companies, many of which market their products as “research use only” but sell them for off-label human use. Recent warning letters have targeted online vendors contributing to this growing trend, fueled by the surging popularity of GLP-1 therapies.
AI Physicians are on the Horizon
A recent randomized clinical trial recently explored the role of large language models (LLMs) like GPT-4 in enhancing physician diagnostic reasoning. The findings, published in JAMA Network Open, present a snapshot of the current capabilities and limitations of AI and clinician pairings in the clinical settings.
In the trial, 50 physicians from internal, family, and emergency medicine specialties were randomized into two groups: one with access to GPT-4 alongside conventional diagnostic tools and the other relying solely on traditional resources. Both groups evaluated six clinical vignettes, or patient related case scenarios, designed to simulate complex real-world cases. Surprisingly, the presence of GPT-4 did not significantly improve diagnostic performance. Physicians using GPT-4 scored a median of 76% on diagnostic reasoning tasks, marginally higher than the 74% achieved by those using only conventional resources. However, the difference in performance did not reach statistical significance.
In a surprising twist, GPT-4’s standalone diagnostic performance outshone both groups, achieving a median score of 92%. This suggests that the model’s diagnostic capabilities are strong, but physicians may not yet be equipped to fully integrate its insights into their decision-making processes. The authors noted that structured training in prompt engineering and human-AI interaction could help clinicians harness the full potential of tools like GPT-4. Alternatively, this result may hint at the possibility that in certain aspects of diagnostics, AI systems are already capable of surpassing human expertise.