Exploring the Potential of Artificial Intelligence for Earlier Breast Cancer Detection: A Retrospective Multi-Reader Study Based on AI-Assisted Mammographic Interpretation (EARLIEST-AI)
Overview
- Phase
- Not Applicable
- Status
- Active, not recruiting
- Sponsor
- Mammograaf Radioloogiakliinik
- Enrollment
- 785
- Locations
- 1
- Primary Endpoint
- Standalone reader sensitivity
Overview
Brief Summary
EARLIEST-AI Study type: Two-phase, multi-reader, blinded retrospective observational study based on data from a single clinic.
The primary objective of the study is to assess the sensitivity and specificity of radiologists and Artificial Intelligence (AI) in interpreting mammographic examinations for breast cancer detection in a scenario where no previous examinations are available, and to compare the diagnostic performance of radiologists with and without AI support.
The secondary objectives of the study are to assess the independent diagnostic performance of Computer Aided Detection (CAD) software, including sensitivity and specificity in identifying histopathologically confirmed breast cancer cases; to assess inter-reader and intra-reader variability in interpretation with and without AI support; to assess the agreement between AI outputs and histopathological findings; and to assess the impact of mammogram technical parameters on AI performance.
Time frame: Review of a subset of mammograms randomly selected from those performed between January 1, 2012 and December 31, 2024. Imaging findings were classified according to the Breast Imaging Reporting and Data System (BI-RADS).
Inclusion criteria:
- Female patients aged 30 years or older.
- One or more mammograms performed between January 1, 2012 and December 31, 2024.
- Meets one of the following criteria:
3.1 Histopathologically confirmed diagnosis of breast cancer within 6 months of mammogram, 3.2 or two consecutive mammograms with BI-RADS 1 or BI-RADS 2.
Exclusion criteria:
- Mammograms of poor quality or artifacts that do not allow for reliable assessment.
- History of breast surgery or previous breast cancer treatment that has significantly altered breast morphology.
- Data deficiencies, including:
3.1. lack of previous histopathological data 3.2. or lack of available follow-up data.
Detailed Description
DETAILED SUMMARY AND JUSTIFICATION OF THE PROPOSED RESEARCH Prophylactic mammograms are routinely performed on asymptomatic women every two years. During a preventive mammographic examination, four images are taken for each woman: two in the Craniocaudal (CC) projection and two in the Mediolateral Oblique (MLO) projection. Each time, the mammographic examination is evaluated using the Breast Imaging Reporting and Data System (BI-RADS).
In daily clinical practice at Mammograaf Radioloogiakliinik (AS Mammograaf), the BI-RADS system is used to evaluate mammographic examinations.
A BI-RADS 1 score is assigned to mammograms with no pathological changes. A BI-RADS 2 score indicates benign changes on mammograms.
In the presence of suspicious or malignant changes, a score of BI-RADS 3, BI-RADS 4 and/or BI-RADS 5 is assigned.
BI-RADS 6 is histopathologically confirmed breast cancer. The mammogram viewing protocol used in daily clinical practice at Mammograaf Radioloogiakliinik is presented below.
Screening mammograms are always independently evaluated by two radiologists. In other words, routine evaluation of screening mammograms is performed using an independent double-reading protocol. If two radiologists independently evaluate a mammogram and give it a BI-RADS 1 and/or BI-RADS 2 rating, this is considered a blind consensus, which indicates that the patient does not need further exams - i.e., there are no ambiguous changes or changes indicating malignancy on the mammogram. In BI-RADS 3, 4, and 5 cases, an unblinded consensus reading is used, when the radiologists review the mammograms together, based on the existing diagnosis and a discussion of the case with colleagues to reach a consensus. The daily work of both the principal investigator and the experienced readers participating in the study includes evaluating previous mammograms, analyzing complex cases, and detecting interval cancers. Special viewing strategies are used to help reduce the subjectivity of the evaluation process.
One such method is the so-called blinded review, in which previous mammograms are reviewed without knowing the location of the pathological area to avoid the effect of hindsight (retrospective bias).
To obtain statistically significant differences, the cohort must be large enough without creating an excessive burden to the readers.
The usual work process of Mammograaf Radioloogiakliinik breast radiologists looks like this.
The usual work pace of experienced radiologists allows them to evaluate up to 500 mammograms per day without a decrease in the quality of work. For radiologists with less experience, the optimal workload is up to 200 mammograms per day.
Using artificial intelligence in daily work requires the radiologist to get used to it, to have skills and to understand the process.
Three notable aspects were identified during 1.5 years of experience with artificial intelligence.
- Human cognitive errors Two main cognitive errors can occur in radiology when working with an artificial intelligence system: confirmation bias and anchoring. The manifestation of these errors depends on whether the radiologist first looks at the mammograms and only then at the artificial intelligence markings, or starts the other way around - from the artificial intelligence markings.
The process of reviewing the compiled dataset is structured in a way that allows assessment of the extent to which such errors manifest in routine practice and which reading scenario could be the most suitable for daily work. 2. Conditional false positive markings on mammograms by artificial intelligence When working with an artificial intelligence system, false-positive markings were observed when the system assigned high- or medium-risk markings to mammograms classified by experienced radiologists as BI-RADS 1 or BI-RADS 2.
Considering that the final decision currently always remains with the radiologist, artificial intelligence markings are often disregarded when inconsistent with the professional assessment of the mammograms.
In 2024, the sensitivity of Hera artificial intelligence at Mammograaf Radioloogiakliinik was 94.5%: a total of 138 cancers were diagnosed during screening, of which 131 were also marked by Hera artificial intelligence.
The specificity of Hera AI could be assessed in two ways:
- Option 1: after about two years, by retrospectively analyzing 2024 results.
- Option 2: by analyzing the dataset created within the framework of this study, which includes both BI-RADS 3, BI-RADS 4, and BI-RADS 5 mammograms, as well as mammograms without visible pathology, whose BI-RADS 1 and/or BI-RADS 2 assessment was confirmed during repeated screenings.
The latter option is preferable, as it may provide a better understanding of how to address so-called false-positive labels in the near future. At present, the attitude towards them is subjective and largely depends on the radiologist's personality and interpretation style. 3. Technical quality of mammograms. In daily practice, parameters such as breast positioning and the pressure applied during mammogram acquisition have been observed to affect both image quality and the performance of the radiologist and the AI. This study will provide an opportunity to objectively assess the relationships between technical parameters and the sensitivity and specificity of AI.
In planning this study, three hypotheses were formulated.
- Hypothesis 1: The use of AI support allows for the diagnosis of breast cancer at an earlier stage compared to the current practice based on radiologists' judgment.
- Hypothesis 2: Radiologists' attitude towards AI recommendations depends on their experience.
- Hypothesis 3: The technical parameters of mammograms significantly affect the specificity and sensitivity of AI.
STUDY TIME This study has a two-phase design, involving multiple readers and multiple reading sessions. A break period is planned between the two phases.
The study schedule is as follows:
- Initial data collection: 2 months
- Reading sessions: 3 × 2 months
- Reader break period: 1 month
- Statistical analysis and formalization of conclusions: 3 months Therefore, the total expected duration of the study is 12 months. The study activities are planned to start in October 2025 and will last for 12 months.
Personal data will be deleted approximately on 01.01.2026, after which only fully anonymized data will be retained for further scientific analysis.
DETAILED DESCRIPTION OF SUBJECTS AND THEIR RECRUITMENT METHOD
- A retrospective study, in the framework of which an anonymous data set containing mammograms will be compiled.
All mammograms were performed at Mammograaf Radioloogiakliinik. The data set consists of 1000 mammographic sets, each set containing four images of one woman: craniocaudal (CC) and mediolateral oblique (MLO) images of the right and left breasts.
- to compile a list of mammograms related to BI-RADS 6 cases, where the following data are entered before anonymization: patient ID code and age, dates of mammograms related to the diagnosis, cancer type (invasive ductal carcinoma, invasive lobular carcinoma, ductal carcinoma in situ, estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER 2) receptor status and Ki-67 proliferation index result from biopsy, BI-RADS score of mammograms.
- to compile a list of mammograms not related to BI-RADS 6 cases, where the following data are entered before anonymization: patient ID code and age, dates of mammograms included in the data set, BI-RADS score of mammograms.
- to compile a radiological gold standard consisting of BI-RADS 6 cases. In this research, in addition to the BI-RADS score, the region of interest (ROI) for possible findings is also requested. To perform such a task, it is assumed that reviewing a dataset of 100 mammograms will take an experienced radiologist takes approximately 30 minutes, while it is estimated that it takes 60-90 minutes for a less experienced radiologist.
This burden does not cause significant obstacles or disruptions in the daily workflow.
For non-cancer mammograms, were included only the cases with the negative assessment (BI-RADS 1 or 2) confirmed at subsequent screenings.
For each mammographic study included in the dataset, the corresponding BI-RADS assessment determined during clinical work is recorded in the protocol to ensure transparency and reliability of the study.
Statistical significance is calculated based on the expected difference in sensitivity in the reference test (reading without AI support) and the test being evaluated (reading with AI support). The estimated sensitivity of the reference test is 0.85 and the sensitivity of the test being evaluated is 0.90. A two-proportion Z-test is used to estimate the sample size. The desired statistical power is p = 0.90 and the significance level α = 0.05. Based on these parameters, the sample size has been calculated to be 914, which will be rounded to 1000 cases during the study.
The calculation based on the McNemar test (90% vs. 87% sensitivity, α = 0.05, power 0.90, prevalence ~30%) showed a minimum sample size of 876, which was conservatively rounded to 1000 to avoid the risk of low statistical power.
For this reason, a dataset comprising 1,000 mammograms was constructed. The possibility of seeking the opinion of an independent biostatistician is being considered in this study. 2. Professional experience of readers
In the framework of this study, the term expert is defined as follows: a radiologist who works daily in the field of breast radiology, including evaluating mammograms, performing breast ultrasound examinations and taking biopsies under ultrasound and/or magnetic resonance imaging guidance, as well as describing breast MRI examinations. An expert is considered a specialist who has:
- at least 10 years of clinical experience in breast radiology,
- an annual reading volume of at least 12,000 mammographic examinations,
- daily work experience with the use of artificial intelligence for at least 12 months.
The lead investigator is a breast radiologist with approximately 20 years of experience in evaluating mammographic examinations, having evaluated over 77,000 mammographic examinations in the last five years, which means an average of over 15,000 examinations per year. In addition, the lead investigator performs breast ultrasound examinations, takes biopsies under both ultrasound and MRI guidance, and describes breast magnetic resonance imaging examinations. The lead investigator also has 18 months of daily work experience in using artificial intelligence in the interpretation of mammography. In addition, the lead investigator has routinely evaluated the positioning quality of mammographic examinations for at least seven years.
The principal investigator will act as an expert in compiling the dataset and validating the gold standard.
- Reader 1: has approximately 30 years of experience in evaluating mammography studies, with over 76,000 mammographic studies evaluated in the past five years, averaging over 15,000 studies per year.
- Reader 2: has approximately 10 years of experience in evaluating mammography studies, with over 31,000 mammographic studies evaluated in the past five years, averaging over 6,000 studies per year.
- Reader 3: has approximately 1 year of experience in evaluating mammography studies, with 4,324 mammographic studies evaluated in the past year.
- Reader 4: has less than 1 year of experience in evaluating mammography studies, with 2,963 mammographic studies evaluated in the past year.
- Reader 5: experience in evaluating mammography studies is less than 1 year, 2,242 mammography studies have been evaluated in the last year.
The different experience levels of the readers are important in conducting this study. This allows assessment of the influence of the artificial intelligence-proposed interpretation of findings on radiologists with different levels of experience and investigation whether this confirms or refutes the second hypothesis.
DETAILED DESCRIPTION OF THE STUDY METHODOLOGY The strategy, process, and data analysis of the dataset are deliberately hidden from the readers in order to maintain the independence of the assessments and avoid cognitive bias. The entire study methodology is documented in the protocol (section 6, section 7, section 12), and the corresponding information is made available to the participants after the end of the viewing phases.
Such a dataset allows, in a scenario where only one mammographic examination is available - without clinical information and previous studies - to assess the ability of readers and AI to detect breast cancer under conditions of limited information.
Based on the information provided in Section 6.2.1 of the protocol, the configuration of this dataset allows an answer to the first hypothesis to be obtained.
Such a dataset allows quantitative answers to be provided to the following questions, thereby ensuring that the second and third hypotheses can be addressed.
- What is the specificity and sensitivity of the AI-based CAD system?
- What is the specificity and sensitivity of human readers?
- Do CAD cues affect human readers' performance?
- Does human reader performance depend on when CAD cues are presented - at the beginning of the viewing or after the reader's initial assessment?
- What factors affect the diagnostic performance of human readers and an AI-based CAD system?
In particular:
- Reader-related factors: level of experience
- Patient-related factors: age, breast density
- Malignant process characterization: tumor type, grade, size, visibility in only one or both projections (CC and/or MLO)
- Technical parameters of mammograms: compression force, compression pressure, correct positioning
DESCRIPTION OF THE COORDINATING RESEARCHER OR RESPONSIBLE RESEARCHER ON ETHICAL AND DATA PROTECTION ASPECTS OF THE STUDY
- Ethical aspects The study is conducted by Mammograaf Radioloogiakliinik (reg. no. 10233840) and HERA-MI (Siret: 828871699).
HERA-MI is a manufacturer and developer of artificial intelligence; Mammograaf Radioloogiakliinik is a healthcare provider and a daily user of HERA-MI artificial intelligence in its practice. Mammograaf Radioloogiakliinik has been a HERA-MI reference center since 2024. There are no financial transactions between the parties involved in this research.
Mammograaf Radioloogiakliinik and HERA-MI will prepare and sign a declaration of avoidance of conflict of interest to ensure the transparency and reliability of the study.
After completion of the study, knowledge of in-house performance will be enhanced, and understanding of the limits of the reliability of the artificial intelligence tool will improve.
The study only uses data collected in the course of routine practice, and the subjects will not be contacted for the purpose of conducting the research. Participation in the study will not affect the patient's life, treatment, or follow-up. The study will not impose any additional obligations on the subjects or their loved ones.
Participation in the study will not directly benefit the subjects or their loved ones.
Indirect benefits to the subjects may arise through optimization of patient care, supported by the disclosure and discussion of the study results.
The planned study plans to process personal data without the subject's consent.
Informing the data subject is not justified because:
- The data processing does not harm the interests of the data subject. The output of the data processing is a descriptive generalization of the data.
- Asking for permission would involve unreasonably high resource costs. Obtaining informed consent would be logistically complex.
Existence of an overriding public interest. Breast cancer is the most common malignant tumor in women and one of the main causes of morbidity and mortality. Early detection using mammography significantly improves treatment outcomes. Mammograaf Radioloogiakliinik is a well-known breast diagnostic center in Estonia, where both screening and diagnostic studies are performed using modern methods and technologies. Conducting this study is important for assessing the quality of the current work methods and analyzing the impact of the added value of artificial intelligence in radiological decision-making. Introducing the results of the research to the wider medical community will help raise awareness of modern breast diagnostic options. The study and subsequent analyses have the potential to make a significant contribution to the development of breast radiology work processes and technologies.
Since the data is anonymous at the stage of assessing mammograms, if high or medium risk of malignancy markings are added by artificial intelligence to mammograms not related to cancer cases included in the data set during the research, and/or BI-RADS ≥ 3 assessments are made by readers, this cannot affect the assessments given to mammograms in real clinical work or the treatment of patients. 2. Data protection aspects In the planned research, it is essential to work with personalized data at the stage of creating the database. To create the database, it is necessary to bring together data generated at different times from different data sources: Mammograaf Radioloogiakliinik paper archive (BI-RADS 6 patient personal identification codes, biopsy results), Mammograaf Radioloogiakliinik Visage Imaging Picture Archiving and Communication System (PACS) for mammograms, Mammograaf Radioloogiakliinik Radiology Information System (RIS) Lisa (BI-RADS 6 patient personal identification codes, biopsy results), Mammograaf Radioloogiakliinik RIS Optomed (BI-RADS 1 and/or BI-RADS 2 patient personal identification codes), for which the use of a personal identification code is required. The database is created and stored on a secure server of Mammograaf Radioloogiakliinik by the responsible researcher, who has access to the aforementioned data sources based on his/her work duties based on his/her ID card.
The sets of a single cancer case are marked in the manner precisely described in Section 11 of the study protocol.
- To ensure the necessary confidentiality of the readers participating in the study, certain data that could disclose excessive information are not included in the application but are presented separately in Section 12 of the study protocol.
- To ensure the necessary confidentiality of the readers participating in the study, certain data that could disclose excessive information are not included in the application but are presented separately in Section 12 of the study protocol.
- To ensure the necessary confidentiality of the readers participating in the study, certain data that could disclose excessive information are not included in the application but are presented separately in Section 12 of the study protocol.
- The marking does not allow the identification of the subject. To create the marking, it is necessary to know the number of cancer cases, the number of sets and the dates of their execution. The marking is performed by HERA-MI.
- The responsible researcher links the marking to the created database and deletes the personal identification codes used in its creation from the database (no later than two months after receiving permission to conduct the study, if preliminary work begins in October 2025, the approximate date of data deletion is January 1, 2026). The marked database will thus become anonymous; further research activities will be carried out with anonymous data.
- Mammograms not related to cancer cases will be marked in random order with combinations of letters and numbers.
- The delimitation of findings on mammograms by readers will be clearly marked in accordance with the principles of the BI-RADS classification.
The marked database will be stored on the secure server of Mammograaf Radioloogiakliinik. Collections will be created in PACS from the study mammograms, which are accessible to readers based on personal passwords and with separate access to artificial intelligence. Access and cooperation between the reader and artificial intelligence do not differ from those used in normal work. Mammograms are accompanied by information (age in years), which is commonly used in the assessment process. At the end of each viewing phase, an Excel table containing the assessments is generated on the Mammograaf Radioloogiakliinik server.
The main markings, related markings, dates, and clinical information with cancer cases are stored in an Excel table, which is accessible only to the responsible investigator.
Data analysis is performed by HERA-MI. Data analysis takes place on the Mammograaf Radioloogiakliinik server. The results of the study are presented in a summarized form.
The anonymous data used in the study are stored on the Mammograaf Radioloogiakliinik server for study purposes indefinitely.
Data collection and management
- Data sources
- PACS - digital imaging materials (mammograms)
- Institutional paper archive - personal identification codes, biopsy results, and radiologist descriptions.
- RIS - patient personal identification codes, biopsy results, and radiologist descriptions in cases where paper data is not available.
- Data collected - Demographics: age. - Image data: BI-RADS score, location of findings; shape of masses, edges, density, presence of calcifications; asymmetry, architectural distortion.
- Pathology: year of cancer detection, tumor type (Invasive Ductal Carcinoma (IDC), Invasive Lobular Carcinoma (ILC), Ductal Carcinoma In-Situ (DCIS), etc.), receptor status (ER, PR, HER2), grade, tumor size.
- Image quality (mammography device, compression force and compression pressure, positioning quality)
- Data handling All data collected during the study will be handled in accordance with applicable data protection requirements and the principles set out in the study protocol.
The data will be processed as follows:
(1) Personal data will be removed by anonymization before analysis. (2) A unique identifier will be created for each case and each reader, enabling data traceability without identifying individuals.
(3) Raw data - including mammograms, reader markings and AI outputs - will be securely stored on the Mammograaf Radioloogiakliinik server. Only the responsible investigator will have access to the data.
(4) Analysis data will be exported in a structured format (e.g. Excel, CSV, DICOM (Digital imaging and communications in medicine) metadata) that allows for statistical processing and quality control.
(5) All changes in data collection or processing will be documented to ensure auditability.
Software setup
- Software for viewing mammograms The study uses Visage Imaging PACS software for viewing mammograms and for creating and saving ROI markings.
- Randomization and anonymization programs The randomization and anonymization programs for the study were provided by Hera-MI.
- CAD software The software used in this protocol is the Breast-SlimView CAD software from Hera-MI.
The software generates CAD cues in the form of masks: the regions of interest (ROIs) stored by artificial intelligence are placed on a mask similar to the structure of the mammogram.
The CAD results are integrated into the viewing system's drop-down protocol.
Study Design
- Study Type
- Observational
- Observational Model
- Case Control
- Time Perspective
- Retrospective
Eligibility Criteria
- Ages
- 30 Years to — (Adult, Older Adult)
- Sex
- Female
- Accepts Healthy Volunteers
- No
Inclusion Criteria
- Not provided
Exclusion Criteria
- Not provided
Arms & Interventions
BI-RADS 1-2 patients
BI-RADS 6 patients
Outcomes
Primary Outcomes
Standalone reader sensitivity
Time Frame: through study completion, an average of 1 year
Standalone CAD sensitivity
Time Frame: through study completion, an average of 1 year
Augmented Reader sensitivity
Time Frame: through study completion, an average of 1 year
Standalone reader detection to diagnostic time (MAE)
Time Frame: through study completion, an average of 1 year
Augmented reader detection to diagnostic time (MAE)
Time Frame: through study completion, an average of 1 year
Secondary Outcomes
- Intra-reader agreement(through study completion, an average of 1 year)
- Standalone reader specificity(through study completion, an average of 1 year)
- Augmented reader specificity(through study completion, an average of 1 year)
- Inter-reader agreement(through study completion, an average of 1 year)
- Standalone reader markings region-wise annotation precision(through study completion, an average of 1 year)
- Augmented reader markings region-wise annotation precision(through study completion, an average of 1 year)
Investigators
Marina Astapova
MD
Mammograaf Radioloogiakliinik