MedPath

PROJECT 2 EXAMPLE: Feedback X Prevalence Using Dermatology Stimuli

Not Applicable
Completed
Conditions
Decision Making
Interventions
Behavioral: Feedback
Behavioral: Prevalence
Registration Number
NCT05244122
Lead Sponsor
Brigham and Women's Hospital
Brief Summary

Imagine that a dermatologist spends the morning seeing patients who have been referred for suspicion of skin cancer. Many of them do, in fact, have skin lesions that require treatment. For this set of patients, disease 'prevalence' would be high. Suppose that the next task is to spend the afternoon giving annual screening exams to members of the general population. Here disease prevalence will be low. Would the morning's work influence decisions about patients in the afternoon? It is known from other contexts that recent history can influence current decisions and that target prevalence has an impact on decisions. In this study, decisions were decisions about skin lesions from individuals with varying degrees of expertise, using an online, medical imaging labelling app (DiagnosUs). This allowed examination of the effects of feedback history and prevalence in a single study. Blocks of trials could be of low or high prevalence, with or without feedback. Over 300,000 individual judgements were collected. (taken from Wolfe, J. M. (2022). How one block of trials influences the next: Persistent effects of disease prevalence and feedback on decisions about images of skin lesions in a large online study. . Cognitive Research: Principles and Implications (CRPI), 7, 10. doi: https://doi.org/10.1186/s41235-022-00362-0

Detailed Description

This description is based on a preregistration on the Open Science Framework site. Note that this is a "BESH" study. This type of research is not designed as a traditional clinical trial, but it is being reported here because of changes in NIH clinical trial reporting rules. This is one study from Project 2 of NE017001.

Levari et al (2018) found that people responded to a decrease in the prevalence of a stimulus by expanding their concept of it. Specifically, they asked observers to judge on each trial whether a dot, drawn from a blue-purple continuum, was blue or not. The results showed that observers were more likely to call ambiguous stimuli "blue" when blue items were less prevalent. In signal detection theory (SDT) terms, this is a liberal shift of response criterion. This is "prevalence induced concept change" (PICC). However, previous results obtained the opposite results in a long series of experiments on prevalence effects. The standard finding is that Os miss more targets at low prevalence. When blue is rare, they are less likely to call something blue. In SDT terms, this is a conservative criterion shift. This is the classic Low Prevalence Effect (LPE). In a round of earlier experiments, Lyu et al (2021) found that feedback is a critical variable. With trial-by-trial feedback, we get an LPE. With no feedback, the data usually show PICC results.

Do LPE and PICC effects show up when experts view stimuli in their expert domain? There is evidence for the LPE from search tasks (e.g. Evans, K. K., Birdwell, R. L., \& Wolfe, J. M. (2013). If You Don't Find It Often, You Often Don't Find It: Why Some Cancers Are Missed in Breast Cancer Screening. . PLoS ONE 8(5): e64366. , 8(5), e64366. doi: doi:10.1371/journal.pone.0064366). However, PICC evidence has not been collected and there is no data from single item decision tasks like the "Is this dot blue?" task. This is important because criterion shifts of the sort described above can have obvious health care implications.

This study will repeat the basic "Is this dot blue" experiment using dermatology stimuli (Is this melanoma or just a nevus (a mole)?)

Hypotheses:

(H1) without feedback, Os are more likely to label a spot as cancer when cancer prevalence is low (prevalence-induced-concept-change).

(H2) that with feedback, Os are less likely to label a spot as cancer when cancer prevalence is low (classic low prevalence effect)

Dependent variable

The main dependent variable is the proportion of cancer responses as a function of the cancer prevalence in the image set, but we will also record reaction times.

Conditions

How many and which conditions will participants be assigned to?

Four conditions will be run, between observers.

1. 50% cancer images with feedback

2. 50% cancer images without feedback

3. 20% cancer images with feedback

4. 20% cancer images without feedback

Observers will make a simple 2-alternative forced-choice (2AFC) cancer/no cancer decision.

Observers will be awarded points based on the correctness of the answer (more correct, more points)

There will be 200 trials in each block. That will produce 40 target present trials in the low prevalence conditions which should produce a hit rate that is not too coarse.

Stimuli will be images of moles from the ISIC archive. Each image comes with a known answer of either melanoma (cancer) or nevus (negative).

Analyses

The data will produce a continuum from not-cancer to cancer based on the observers responses in the 50% with feedback condition. This will give yield a psychometric function rising (it may be assumed) from near 0% cancer responses to near 100%.

Using that ordering, psychometric functions will be generated for the other three conditions.

To examine the effect of prevalence and the presence and absence of feedback on observers' response behavior, \\run a logistic regression with prevalence and feedback as factors in a generalized mixed model will be run using jamovi software.

The data will also be used to compute the signal detection measures of sensitivity (d') and criterion (c) based on the actual truth about the images. That is, "cancer" responses will be coded as True positives if the images show cancer and as "false positives" if they do not. T-tests will be performed to examine whether d' and/or c (criterion) change significantly as a function of prevalence and feedback.

Outliers and Exclusions

N/A

Sample Size

Separate blocks of trials will be run and conditions will be compared with unpaired t-tests.

G\* Power says suggests 36 observers PER GROUP or a total of 144 observers for alpha = 0.05, power = 0.80. The plan will be to attempt to run 45 Os per group, anticipating about 20% loss of Os due to the vagaries of online testing.

Recruitment & Eligibility

Status
COMPLETED
Sex
All
Target Recruitment
1121
Inclusion Criteria
  • All welcome to enroll on line
Exclusion Criteria
  • Under 18 yrs

Study & Design

Study Type
INTERVENTIONAL
Study Design
SINGLE_GROUP
Arm && Interventions
GroupInterventionDescription
Feedback X Prevalence Using Dermatology StimuliPrevalenceIn this experiment, observers (Os) completed blocks of 80 trials. On each trial, they saw an image of a spot on the skin. They classified this as a melanoma (cancer) or a nevis (benign). Blocks could be of low prevalence (20% cancer cases, 16 images) or high prevalence (50%, 40 images). Os either did received trial by trial "Feedback" about their performance accuracy, or they did not. Thus, there were four types of block. Low prevalence, No Feedback Low prevalence, Feedback High prevalence, No Feedback High prevalence, Feedback Each of these four types of block was made available to Os on each of 6 days. Os could elect to view each of the four blocks each day. Our particular interest was in the effect of performing one block on performance on an immediately subsequent block.
Feedback X Prevalence Using Dermatology StimuliFeedbackIn this experiment, observers (Os) completed blocks of 80 trials. On each trial, they saw an image of a spot on the skin. They classified this as a melanoma (cancer) or a nevis (benign). Blocks could be of low prevalence (20% cancer cases, 16 images) or high prevalence (50%, 40 images). Os either did received trial by trial "Feedback" about their performance accuracy, or they did not. Thus, there were four types of block. Low prevalence, No Feedback Low prevalence, Feedback High prevalence, No Feedback High prevalence, Feedback Each of these four types of block was made available to Os on each of 6 days. Os could elect to view each of the four blocks each day. Our particular interest was in the effect of performing one block on performance on an immediately subsequent block.
Primary Outcome Measures
NameTimeMethod
Change in D' Between Pairs of Blocks.Participants could be in the study for as little as two blocks in one day up to 24 blocks collected over 6 days.

D' (d-prime) is defined as z-transform of the true positive rate - z-transform of false positive rate. True positive is when you say that a real melanoma is a melanoma. False positive is when you say that a nevis is a melanoma.

A correction of 0.5 error is added to avoid calculation problems when z=0 or z=1.

D' of zero indicates no ability to discriminate. D' \> zero indicates some ability to discriminate.

The change of interest is the D' for Block 2 when it follows Block 1 compared to the D' for Block 2 averaged across all conditions.

Change in Criterion Between Pairs of Blocks.Participants could be in the study for as little as two blocks in one day up to 24 blocks collected over 6 days.

Criterion, c, corresponds to the position of the midpoint between the z-transformed probabilities of hits (correct yes responses) and false alarms (incorrect yes responses). It is calculated as -\[z(p(h))+z(p(FA))\]/2. The criterion, c, z-score quantifies the distance away from being unbiased in units of standard deviations. A Z-score of 0 is said to be unbiased. Negative values for c indicate a more relaxed criterion for saying yes. Positive numbers indicate a more strict criterion for saying yes.

Secondary Outcome Measures
NameTimeMethod

Trial Locations

Locations (1)

Visual Attention Lab, Brigham and Women's Hospital

🇺🇸

Boston, Massachusetts, United States

© Copyright 2025. All Rights Reserved by MedPath