ChatGPT Reduces Clinical Trial Screening Time from 40 Minutes to Under 3 Minutes, But Human Oversight Still Required

Researchers at UT Southwestern Medical Centre have demonstrated that ChatGPT can dramatically reduce the time required to screen patients for clinical trial eligibility, cutting review times from an average of 40 minutes per patient record to as little as 1.4 minutes in some cases. However, the study published in Machine Learning: Health reveals that human oversight remains essential due to the AI models' limitations in accurately identifying all eligible patients.

Performance Comparison Between AI Models

The research team, led by Dr. Mike Dohopolski, evaluated both ChatGPT-3.5 and ChatGPT-4 using data from 74 patients, including 35 already enrolled in a phase 2 cancer trial and 39 randomly selected ineligible patients. GPT-4 consistently outperformed GPT-3.5 across all metrics, achieving a median accuracy of 84% compared to GPT-3.5's best performance of 91% under optimal conditions.

Using the self-discover prompting approach, GPT-4 achieved its highest Youden's Index of 0.73, demonstrating superior balance between sensitivity and specificity. The model maintained median accuracies of 94% and 85% in different trial contexts, significantly surpassing GPT-3.5's median accuracies of 87% and 72% in the same scenarios.

Cost and Time Efficiency Analysis

The time and cost differences between the two models were substantial. GPT-3.5 screening required 1.4 to 3.0 minutes per patient at a cost of $0.02 to $0.03 each, while GPT-4 took 7.9 to 12.4 minutes and cost $0.15 to $0.27 per patient. Despite the higher costs, both models represent significant savings compared to manual screening processes.

"LLMs like GPT-4 can help screen patients for clinical trials, especially when using flexible criteria," said Dohopolski. "They're not perfect, especially when all rules must be met, but they can save time and support human reviewers."

Critical Limitations in Patient Identification

Both AI models demonstrated a concerning pattern in their performance metrics. While achieving high specificity (median 100% for both models), their sensitivity remained problematically low. GPT-3.5 showed median sensitivity of 0%, while GPT-4 achieved only 16% median sensitivity, indicating both models struggle to correctly identify eligible patients despite being effective at ruling out ineligible ones.

When assessing patient eligibility for trial enrollment, GPT-3.5 achieved a median accuracy of 0.54 (95% CI, 0.50-0.61), with its best performance reaching 0.611 using structured and expert guidance approaches. GPT-4 performed marginally better with a median accuracy of 0.61 (95% CI, 0.54-0.65) and highest accuracy of 0.65 using the chain of thought plus expert approach.

Error Analysis Reveals Processing Challenges

Analysis of 42 misclassifications revealed two primary error types. The most common issue was improper processing of available information, accounting for 95% of GPT-4's errors and 71% of GPT-3.5's errors. This occurred when models correctly identified relevant text but misinterpreted details such as dates, locations, or clinical requirements.

The second error type, failure to identify relevant information, was more prevalent in GPT-3.5 (29% of errors) compared to GPT-4 (5% of errors), where models failed to locate necessary text for accurate responses.

Clinical Trial Enrollment Crisis

The research addresses a critical problem in clinical research, as up to 20% of National Cancer Institute-affiliated trials fail due to inadequate patient enrollment. This failure not only inflates costs and delays results but also undermines the reliability of new treatment assessments.

Part of the challenge stems from valuable patient information buried in unstructured text within electronic health records, such as doctors' notes, which traditional machine learning software struggles to interpret. The researchers suggest that LLMs could help by flagging candidates for subsequent manual review, potentially addressing the capacity limitations that cause eligible patients to be overlooked.

Implementation Recommendations

The study authors concluded that "LLM performance varies by prompt, with GPT-4 generally outperforming GPT-3.5, but at higher costs and longer processing times. LLMs should complement, not replace, manual chart reviews for matching patients to clinical trials."

The research team acknowledges several limitations, including concerns about ongoing costs with closed-source GPT models, lack of metadata extraction from clinical notes, and the need for specialized domain expertise to generate effective guidance. The single-institution patient sample with specific documentation styles may also limit generalizability to other healthcare settings.

AI-Powered Research

Premium Access

ChatGPT Reduces Clinical Trial Screening Time from 40 Minutes to Under 3 Minutes, But Human Oversight Still Required

Key Insights

Performance Comparison Between AI Models

Cost and Time Efficiency Analysis

Critical Limitations in Patient Identification

Error Analysis Reveals Processing Challenges

Clinical Trial Enrollment Crisis

Implementation Recommendations

Stay Updated with Our Daily Newsletter

Related News

AI-Powered Screening Doubles Enrollment Rate in Heart Failure Clinical Trial

NIH's TrialGPT Uses AI to Accelerate Clinical Trial Recruitment

Novel Blood Test Could Revolutionize Lung Cancer Screening Amid Low CT Scan Adoption

LLMs Show Limited Benefit in Enhancing Physician Diagnostic Reasoning: A Randomized Trial

Supervised Machine Learning Emerges as Preferred AI Solution for Clinical Trials Over Large Language Models

AI Accelerates Clinical Trial Recruitment, Reducing Enrollment Times and Costs

AI-Powered Clinical Trial Matching Platforms Show Promise in Addressing Patient Recruitment Challenges

Artificial Intelligence Enhances Cancer Clinical Trial Enrollment Efficiency

Sources

ChatGPT helps speed up patient screening for clinical trials - News-Medical.net

LLMs Show Promise, But Challenges Remain in Improving Inefficient Clinical Trial Screening

ChatGPT can cut review times for clinical trial eligibility | pharmaphorum

Regeneron's DB-OTO Gene Therapy Restores Hearing in 11 of 12 Children with Genetic Hearing Loss

FDA Adds Boxed Warning to Carvykti for Fatal Gastrointestinal Complications

FDA Approves Expanded RINVOQ Label for IBD Treatment, Allowing Earlier Use When TNF Blockers Inadvisable

AdvanCell Reports First Clinical Data for Lead-212-Based PSMA Therapy in Prostate Cancer at ESMO 2025

Abeona Therapeutics' ABO-503 Gene Therapy Selected for FDA's Rare Disease Endpoint Advancement Pilot Program

Natera's Signatera MRD Test Shows Promise in Guiding Bladder Cancer Treatment at ESMO 2025

Bristol Myers Squibb and Lilly Showcase Major Oncology Data at ESMO 2025

Lupin to Present Phase 1a Data on STING Agonist LNP3693 at ESMO 2025

4DMT Secures $11 Million CF Foundation Investment to Advance Gene Therapy 4D-710 into Phase 2 for Cystic Fibrosis

AnnJi's AJ201 Shows Clinical Promise for Kennedy's Disease in Phase 2a Trial

MedPath

Product

Company

Legal