AI-Assisted Acute Myeloid Leukemia Evaluation With the Leukemia End-to-End Analysis Platform (LEAP) Versus Clinician-Only Assessment
- Conditions
- Acute Promyelocytic Leukemia (APL)Acute Myeloid Leukaemia (AML)
- Registration Number
- NCT07203885
- Lead Sponsor
- Harvard Medical School (HMS and HSDM)
- Brief Summary
This study will test whether artificial intelligence (AI) can help doctors diagnose a rare blood cancer called acute promyelocytic leukemia (APL) more quickly and accurately. Doctors usually examine bone marrow samples under a microscope to make this diagnosis, but it can be challenging and time-consuming.
In this study, doctors will review bone marrow samples under three different conditions:
* Unaided Review: Without AI assistance.
* AI as Double-Check: AI-generated evaluation shown after the doctor makes an initial decision.
* AI as First Look: AI-generated evaluation shown at the start of the review.
Doctors will be randomly assigned to different orders of these three conditions. This design will allow us to compare how AI support affects diagnostic accuracy, speed, and confidence.
- Detailed Description
This study aims to evaluate the effect of artificial intelligence (AI) assistance on clinicians' diagnostic performance in detecting acute promyelocytic leukemia (APL) using Wright-Giemsa-stained bone marrow whole-slide images (WSIs). The Leukemia End-to-End Analysis Platform (LEAP) will serve as the AI model under assessment.
This is a single-session, within-reader study. Participants will be randomly assigned to one of two study arms, which differ in the order of diagnostic blocks:
\* Arm 1 (X -\> Y): Block X (Unaided Review): Clinicians review WSIs without AI support. Diagnostic accuracy, time to decision, and confidence will be recorded.
Block Y (AI-Assisted Review): Comprising two sub-blocks presented in randomized order:
Y1 (AI as Double-Check): Clinicians provide an initial diagnosis and confidence score without the aid of AI. AI predictions are then revealed, and clinicians may revise their diagnosis. Both pre-AI and post-AI decisions will be recorded.
Y2 (AI as First Look): Clinicians review WSIs with AI-predicted diagnoses visible from the beginning.
\* Arm 2 (Y -\> X): Block Y (AI-Assisted Review): Sub-blocks Y1 and Y2 presented in randomized order.
Block X (Unaided Review): As described above.
Each clinician will review up to 120 de-identified WSIs. For each reader, slides will be randomly divided into three disjoint subsets (e.g., approximately 40/40/40), stratified by APL status, and assigned to Block X (Unaided), Block Y1 (AI as Double-Check), or Block Y2 (AI as First Look). No slide will be shown to the same reader in more than one block.
In addition, the AI system will independently generate diagnostic predictions for all WSIs to enable benchmarking; however, this does not constitute a participant arm.
Ground-truth diagnoses will be determined by molecular confirmation and expert consensus.
Recruitment & Eligibility
- Status
- ENROLLING_BY_INVITATION
- Sex
- All
- Target Recruitment
- 10
Not provided
Not provided
Study & Design
- Study Type
- INTERVENTIONAL
- Study Design
- CROSSOVER
- Primary Outcome Measures
Name Time Method Diagnostic performance of APL detection Periprocedural (at the time of slide review) Performance of clinicians (unaided and AI-assisted) in detecting APL, measured in accuracy, sensitivity, specificity, positive predictive value, and negative predictive value.
- Secondary Outcome Measures
Name Time Method Time to diagnosis Periprocedural (at the time of slide review) Average time (seconds per case) required to finalize a diagnosis.
Inter-observer variability Periprocedural (at the time of slide review) Agreement among clinicians across conditions, measured using inter-rater reliability metrics (e.g., kappa statistics).
Concordance between AI predictions and clinicians' diagnoses Periprocedural (at the time of slide review) The proportion of cases in which AI predictions match clinicians' decisions in each study condition.
Decision-change rates Periprocedural (at the time of slide review) The proportion of cases in which a clinician's initial diagnosis is revised after exposure to AI assistance.
Net benefit after AI exposure Periprocedural (at the time of slide review) The overall change in diagnostic accuracy attributable to AI assistance.
Clinician confidence level Periprocedural (at the time of slide review) Self-reported diagnostic confidence recorded for each case.
Scale:
5 - Absolutely Certain; 4 - Mostly Certain; 3 - Unsure; 2 - Very Doubtful;
1 - Random Guess;
With 5 being the highest confidence score and 1 being the lowest.
Trial Locations
- Locations (1)
Harvard Medical School
🇺🇸Boston, Massachusetts, United States
Harvard Medical School🇺🇸Boston, Massachusetts, United States