MedPath

AI-Assisted Acute Myeloid Leukemia Evaluation With the Leukemia End-to-End Analysis Platform (LEAP) Versus Clinician-Only Assessment

Not Applicable
Conditions
Acute Promyelocytic Leukemia (APL)
Acute Myeloid Leukaemia (AML)
Registration Number
NCT07203885
Lead Sponsor
Harvard Medical School (HMS and HSDM)
Brief Summary

This study will test whether artificial intelligence (AI) can help doctors diagnose a rare blood cancer called acute promyelocytic leukemia (APL) more quickly and accurately. Doctors usually examine bone marrow samples under a microscope to make this diagnosis, but it can be challenging and time-consuming.

In this study, doctors will review bone marrow samples under three different conditions:

* Unaided Review: Without AI assistance.

* AI as Double-Check: AI-generated evaluation shown after the doctor makes an initial decision.

* AI as First Look: AI-generated evaluation shown at the start of the review.

Doctors will be randomly assigned to different orders of these three conditions. This design will allow us to compare how AI support affects diagnostic accuracy, speed, and confidence.

Detailed Description

This study aims to evaluate the effect of artificial intelligence (AI) assistance on clinicians' diagnostic performance in detecting acute promyelocytic leukemia (APL) using Wright-Giemsa-stained bone marrow whole-slide images (WSIs). The Leukemia End-to-End Analysis Platform (LEAP) will serve as the AI model under assessment.

This is a single-session, within-reader study. Participants will be randomly assigned to one of two study arms, which differ in the order of diagnostic blocks:

\* Arm 1 (X -\> Y): Block X (Unaided Review): Clinicians review WSIs without AI support. Diagnostic accuracy, time to decision, and confidence will be recorded.

Block Y (AI-Assisted Review): Comprising two sub-blocks presented in randomized order:

Y1 (AI as Double-Check): Clinicians provide an initial diagnosis and confidence score without the aid of AI. AI predictions are then revealed, and clinicians may revise their diagnosis. Both pre-AI and post-AI decisions will be recorded.

Y2 (AI as First Look): Clinicians review WSIs with AI-predicted diagnoses visible from the beginning.

\* Arm 2 (Y -\> X): Block Y (AI-Assisted Review): Sub-blocks Y1 and Y2 presented in randomized order.

Block X (Unaided Review): As described above.

Each clinician will review up to 120 de-identified WSIs. For each reader, slides will be randomly divided into three disjoint subsets (e.g., approximately 40/40/40), stratified by APL status, and assigned to Block X (Unaided), Block Y1 (AI as Double-Check), or Block Y2 (AI as First Look). No slide will be shown to the same reader in more than one block.

In addition, the AI system will independently generate diagnostic predictions for all WSIs to enable benchmarking; however, this does not constitute a participant arm.

Ground-truth diagnoses will be determined by molecular confirmation and expert consensus.

Recruitment & Eligibility

Status
ENROLLING_BY_INVITATION
Sex
All
Target Recruitment
10
Inclusion Criteria

Not provided

Exclusion Criteria

Not provided

Study & Design

Study Type
INTERVENTIONAL
Study Design
CROSSOVER
Primary Outcome Measures
NameTimeMethod
Diagnostic performance of APL detectionPeriprocedural (at the time of slide review)

Performance of clinicians (unaided and AI-assisted) in detecting APL, measured in accuracy, sensitivity, specificity, positive predictive value, and negative predictive value.

Secondary Outcome Measures
NameTimeMethod
Time to diagnosisPeriprocedural (at the time of slide review)

Average time (seconds per case) required to finalize a diagnosis.

Inter-observer variabilityPeriprocedural (at the time of slide review)

Agreement among clinicians across conditions, measured using inter-rater reliability metrics (e.g., kappa statistics).

Concordance between AI predictions and clinicians' diagnosesPeriprocedural (at the time of slide review)

The proportion of cases in which AI predictions match clinicians' decisions in each study condition.

Decision-change ratesPeriprocedural (at the time of slide review)

The proportion of cases in which a clinician's initial diagnosis is revised after exposure to AI assistance.

Net benefit after AI exposurePeriprocedural (at the time of slide review)

The overall change in diagnostic accuracy attributable to AI assistance.

Clinician confidence levelPeriprocedural (at the time of slide review)

Self-reported diagnostic confidence recorded for each case.

Scale:

5 - Absolutely Certain; 4 - Mostly Certain; 3 - Unsure; 2 - Very Doubtful;

1 - Random Guess;

With 5 being the highest confidence score and 1 being the lowest.

Trial Locations

Locations (1)

Harvard Medical School

🇺🇸

Boston, Massachusetts, United States

Harvard Medical School
🇺🇸Boston, Massachusetts, United States

MedPath

Empowering clinical research with data-driven insights and AI-powered tools.

© 2025 MedPath, Inc. All rights reserved.