NCT07481162

Completed

Not Applicable

Psychometric Performance and Student Perceptions of AI- Versus Human-Generated Multiple-Choice Question Development in Medical Education: The AHEAD Randomized Controlled Trial

University of British Columbia1 site in 1 country258 target enrollmentStarted: December 8, 2024Last updated: March 18, 2026

InterventionsAI-generated MCQ examination Human-generated MCQ examination

Overview

Phase: Not Applicable
Status: Completed
Sponsor: University of British Columbia
Enrollment: 258
Locations: 1
Primary Endpoint: Student performance on the mock examination

Overview Study Design Eligibility Criteria Arms & Interventions Outcomes Investigators Sites Records & Identifiers Similar Trials

Overview

Brief Summary

The Artificial Intelligence (AI) vs Human Exam Assessment and Development (AHEAD) Trial is a participant-blinded randomized controlled trial conducted among first-year medical students at the University of British Columbia. The study evaluates whether multiple-choice examination questions generated using large language models (LLMs) perform comparably to traditionally human-written questions in medical education.

Participants were randomized to complete one of two versions of a formative mock final examination consisting of 112 case-based single-best-answer multiple-choice questions (MCQs) aligned with the same course learning objectives. One exam version contained AI-generated questions produced using a structured LLM workflow with independent AI verification, while the other contained questions authored by senior medical students using conventional methods.

The study evaluates exam feasibility, psychometric reliability, validity, student acceptability, and educational impact. Outcomes include exam performance, item discrimination indices, distractor efficiency, student perceptions of exam quality and difficulty, and changes in perceived preparedness for the upcoming summative examination.

Detailed Description

The AHEAD Trial (AI vs Human Exam Assessment and Development) is a single-center, participant-blinded randomized controlled trial conducted among first-year Doctor of Medicine (MD) students enrolled in the Foundations of Medical Practice I (MEDD 411) course at the University of British Columbia.

Participants were randomized in a 1:1 ratio to complete either an AI-generated or a human-generated mock final examination. Both exams consisted of 112 case-based single-best-answer multiple-choice questions (MCQs) aligned with the same MEDD 411 curricular objectives.

AI-generated questions were produced using a structured workflow involving ChatGPT for question generation and Google Gemini for independent verification. Human-generated questions were authored by senior medical students without AI assistance and underwent independent peer review. Both exams followed identical formatting guidelines and assessed the same learning objectives.

All participants completed identical pre-exam and post-exam surveys assessing demographic characteristics, familiarity with artificial intelligence in education, and perceptions of the examination experience. The study evaluates the utility of AI-generated assessments using van der Vleuten's Assessment Utility Framework, including feasibility, reliability, validity, acceptability, and educational impact.

The trial aims to determine whether large language models can accelerate the development of formative medical examinations while maintaining comparable psychometric quality and educational value relative to traditional human-authored questions.

Study Design

Study Type: Interventional
Allocation: Randomized
Intervention Model: Parallel
Primary Purpose: Other
Masking: Single (Participant)

Masking Description

Participants were blinded to the source of the examination questions (AI-generated vs human-generated). All items were reviewed to remove indicators of authorship before distribution.

Eligibility Criteria

Ages: 18 Years to — (Adult, Older Adult)
Sex: All
Accepts Healthy Volunteers: Yes

Inclusion Criteria

•Enrolled first-year medical students in the University of British Columbia MD undergraduate program.
•Can voluntarily consent to participate in the formative mock examination study.

Exclusion Criteria

•Students who declined participation.
•Students who did not complete the mock examination or required surveys.

Arms & Interventions

AI-Generated MCQ Examination

Experimental

Participants completed a 112-item case-based single-best-answer mock examination composed of AI-generated multiple-choice questions. Questions were generated using a structured large language model workflow with ChatGPT-4 for generation and Google Gemini for independent validation.

Intervention: AI-generated MCQ examination (Other)

Human-Generated MCQ Examination

Active Comparator

Participants completed a 112-item case-based single-best-answer mock examination composed of human-authored multiple-choice questions developed by senior medical students using traditional item-writing methods and peer review.

Intervention: Human-generated MCQ examination (Other)

Outcomes

Primary Outcomes

Student performance on the mock examination

Time Frame: Immediately after completion of the mock examination

Comparison of mean examination scores between students randomized to the AI-generated versus human-generated mock examinations.

Secondary Outcomes

Item discrimination index(Immediately after the completion of the mock examination)
Distractor efficiency(Immediately after the completion of the mock examination)
Student-rated examination quality and acceptability(Immediately after completion of the mock examination)
Efficiency ratio of MCQ development time per matched learning objective(Baseline (prior to participant testing))
Change in perceived preparedness for the summative examination(Before and immediately after completion of the mock examination)

Investigators

Sponsor

University of British Columbia

Sponsor Class

Other

Responsible Party

Principal Investigator

Anita Palepu

Professor of Medicine, University of British Columbia

University of British Columbia

Study Sites (1)

Loading locations...

Similar Trials

Recruiting

Not Applicable

AI-Driven Digital Self-Assessment Framework for Preclinical Tooth Preparation

NCT07462156Alexandria University36

Completed

Not Applicable

AI Teaching vs Simulated Patient Teaching for Learning Psychiatric History Taking in Nursing Students

CTRI/2025/11/09797196

Not yet recruiting

Not Applicable

AI-Integrated Mobile Education and Self-Management in Hemodialysis

NCT07300761Ataturk University76

Not yet recruiting

Not Applicable

Prospective Evaluation of Artificial Intelligence-enabled Screening for Transthyretin Amyloid Cardiomyopathy (ATTR-CM)

NCT07398950Yale University150

Not yet recruiting

Not Applicable

A clinical trial to study the effects of a game, AINAR on needle fear in blood donors.

CTRI/2023/12/060680All India Institute of Medical Sciences, New Delhi3,000