Artificial Intelligence-Based Machine Learning to Diagnose and Classify Adenomyosis from Ultrasound Scans: a Multicentre Model Development Study
- Conditions
- Adenomyosis of Uterus
- Registration Number
- NCT06765512
- Lead Sponsor
- CARE Fertility UK
- Brief Summary
The aim of this study is to use the vast dataset of annotated ultrasound images of normal uterus and of adenomyosis of varying severity to train a neural network using deep learning framework (Pytorch) and automated machine learning tool (Vertex AI). The main question it aims to answer are:
1. Diagnostic performance of automated (Google Vertex AI (Artificial intelligence) vision) and deep learning (Pytorch) machine learning model
2. Time saved in assessment of adenomyosis per healthcare professional
- Detailed Description
Background: Adenomyosis is a benign gynaecological condition characterised by the presence of ectopic endometrial cells (cells of the lining of the womb) in the myometrium (muscle layer of the womb). Globally, though adenomyosis is prevalent in women and girls across all stages of life, starting from adolescent to perimenopause, it remains underdiagnosed. Women with adenomyosis experience debilitating symptoms of heavy menstrual bleeding, painful menstrual cycles, chronic pelvic pain, and untoward pregnancy outcomes leading to a long-term impact on their quality of life.
Diagnosis and classification of adenomyosis is not only essential to correlate with symptom severity but also in counselling women about the disease prognosis, type of management and the long-term impact of adenomyosis outcomes. However, classifying the severity is not only a time-consuming and complex process, but it also depends on the operator experience and is subjective. Achieving the same results in a fraction of time with reduced operator bias independent of the expertise of the operator would be likely beneficial.
When dealing with image classification tasks, Machine learning (ML) has an enormous potential in assisting healthcare professionals in classifying the severity of adenomyosis objectively in seconds. Deep learning framework and automated machine learning tool are state-of-the-art tools for automated transvaginal ultrasound scan (TVUS) image analysis. It can directly process and automatically learn mid to high-level abstract features acquired from ultrasound images using a deep architecture model without requiring the manual definition of features in advance. This enables one to train, test, and validate the machine learning model with example labelled images for classification
The aim of the study is to use the annotated ultrasound images of normal uterus and of adenomyosis of varying severity to train a neural network using supervised deep learning that predicts if an input image belongs to one of the following classes: none, mild, moderate or severe adenomyosis. This will be a novel project and may support the clinicians and sonographers in grading the severity of adenomyosis in fraction of a time in the future.
Objective: To investigate the use of image recognition model utilising supervised deep learning via automated machine learning tool (Vertex AI) and deep learning framework (Pytorch) for the creation of an algorithm for ultrasound classification of severity of adenomyosis.
Study design and settings:
Type of study: This will be a multicentre observational cohort study. APPRAISE-AI and Prediction model Risk Of Bias ASsessment Tool- Artificial Intelligence (PROBAST-AI) tools have been referred to for protocol development.
Settings: The ultrasound images of patients with normal uterus and adenomyosis will be retrieved from ten CARE fertility centres in the United Kingdom. This would include CARE Fertility sites at Birmingham, Tamworth, Nottingham, Northampton, Sheffield, Leeds, Manchester, Bolton, Cheshire and Liverpool.
Population: The study cohort will comprise of ultrasound images of patients who attended CARE Fertility centres for Ultrasound between February 2022 to February 2024 and were diagnosed with normal uterus and adenomyosis using Morphological Uterus Sonographic Assessment (MUSA) criteria on screening of images. The MUSA criteria outlines the ultrasound features of myometrium and myometrial lesions using standardised terms, definitions and measurements.
Previously published schematic mapping system of adenomyosis severity will be used for determining the severity of uterine adenomyosis on review of images. This has been chosen due to reproducibility, substantial to almost perfect interobserver agreement rate and clinical correlation with symptom severity. A score ranging from to 1 to 4 is attributed to each grade and the sum of the score numbers is used to calculate the extension of the disease: mild (1-3), moderate (4-6), and severe (\>7). Duration: Ultrasound performed between February 2022 until February 2024 Sample size: The minimum number of images required by Vertex AI Vision for training is 100 per category. The likelihood of successfully recognising a category goes up with the number of high quality examples for each. The target sample size to test the algorithm to classify adenomyosis will be between 1000-10,000 images each for none, mild, moderate and severe adenomyosis. The complete set of images will be distributed equally across classification categories; however, it may not be possible to source an approximately equal number of images for each category. The complete data set will be split manually into two different datasets in 9:1 ratio; 90% of the selected images will be used as training dataset (training + validation) and 10% as test dataset. This will ensure adequate inclusion of diverse and representative images for each category. The test dataset will be an independent dataset that is not used in the training phase but will be obtained from the same facility as data used in the training phase.
Minimisation of the potential impact of biases: All the stored ultrasound images of normal and adenomyotic uterus will be screened by a reviewer experienced in gynaecological and fertility scanning who will be blinded to the baseline characteristics of the women to avoid identification and recruitment bias. Two reviewers will independently classify the severity of adenomyosis into mild, moderate and severe for 1000 sets of images to establish concordance.
AI software details:
Type of machine learning Supervised Deep learning: In supervised learning, the machine needs labelled examples to learn. The aim of the study is to teach a machine to recognise varying severity of ultrasound diagnosed adenomyosis. Ultrasound scan images labelled as "None", "Mild", "Moderate" and "Severe" will be uploaded as input. Studying the examples, the algorithm will learn to recognise what distinguishes a normal uterus from mild adenomyosis and a mild from a severe degree of adenomyosis and to assign the correct classification to each new image it is asked to analyse.
Type of framework and version: Google Cloud Vertex AI Vision API V1 as automated machine learning tool and the most up-to-date version of Pytorch as deep machine learning framework.
Acquisition of input data and selection: Input data to the model will be retrieved from ultrasound images of patients who attended the above mentioned fertility centres for ultrasound between February 2022 to February 2024 and were diagnosed with normal uterus and adenomyosis on screening of images. Eligible ultrasound images of adenomyotic uterus participants will be extracted as two-dimensional (2D) and/or three-dimensional (3D) images after removing the patient identifiers. If a patient has more than one image of adenomyotic or normal uterus, all these images will be included. These images will be reviewed by a reviewer to reduce the risk of bias and formatted to ensure that they contain the adenomyosis ultrasound characteristic of interest in the image. These reviewed and formatted images will be labelled as none, mild, moderate and severe by two reviewers experienced in ultrasound diagnosis and classification of adenomyosis. Three set of input images will be uploaded on the machine learning framework:
1. Training set: 80% of the image data will be in the training set. The model "sees" and initially learns from this data.
2. Validation set: The validation set is also part of the training process, but it's kept separate to tune the model's hyperparameters.10% of the total images different to the training data set will be used so as to fine-tune model structure so that it can generalise better.
3. Test set: The test set enters the stage only after the training process. It will be used to test the performance of the model on the data it has not yet seen. 10% of the total images will be used as test-set. Unlabelled test images will be uploaded to system and the performance of artificial intelligence (AI) model will be compared with the standard i.e., the original classification by the investigators. Splitting of the total image data into training set, validation set and test set will be done manually. Handling of poor quality or unavailable input data: These images will be excluded.
Eligibility criteria:
Inclusion:
A. Participants: Women attending the above mentioned fertility centre for ultrasound between February 2022 to February 2024 for any indication and are diagnosed with normal uterus and adenomyosis on ultrasound on screening of images.
B. Input data: Good quality and conclusive 2D and/ or 3D images of normal uterus and adenomyotic uterus where the ultrasound characteristics of adenomyosis are clearly visible.
Exclusion:
A. Participants: Women with co-existing single or multiple intramural fibroids and endometrial cavity abnormalities.
B. Input data: Inconclusive ultrasound on assessment by the reviewer, poor-quality images where the ultrasound characteristics of adenomyosis are unclear and images which cannot be classified into one of the four categories will be excluded.
Consent:
The study is purely observational with no intervention or additional step being conducted as part of the study. Ultrasound images of normal and adenomyotic uterus would have been acquired at the time of a women having a TVUS which is part of routine clinical practice. Since the ultrasound machine allows the export of images without patient identifiable information, the data (which includes ultrasound images only) will have patient identifiable information (name and date of birth) removed before the data is extracted. NHS Health Research Authority Confidentiality Advisory Group (CAG) pre-application checklist has been referred to determine the eligibility of this project to process patient information without consent. Hence, no informed consent will be requested from the study participants as only anonymized data that is already available will be used.
Outcome and statistical analysis
Primary outcomes: Average precision (positive predictive value), precision, recall, area under the receiver operating characteristic curve (AUC), sensitivity, specificity and accuracy of the model will be the primary outcomes of interest for automated and deep machine learning model. All the analyses will be done using Stata Statistical Software (Release 18, TX, USA) for deep machine learning (ML) framework model and using the automated machine learning software for automated ML model. After the automated model is trained, detailed analysis consisting of summary of the model's performance which will be reported (model output). The will include the following:
* Score threshold: The score threshold refers to the level of confidence the model must have to assign a category to a test item. The score threshold slider in the Google Cloud console is a visual tool to test the effect of different thresholds for all categories and individual categories in the dataset. A score threshold which provides a good balance between false positives and false negatives will be adopted during the evaluation.
* True positives, true negatives, false positives, and false negatives
* Precision and recall: From all the test set of images that were assigned a category, precision indicates how many actually were supposed to be categorised with that label and recall indicates how many were actually assigned the label. The reference standard label refers to the categorization by the investigators.
* Precision/recall curves
Secondary outcome: Evaluation of interrater agreement between two reviewers in the ultrasound assessment of adenomyosis using Fleiss's kappa. K-coefficients used for interpretation of the strength of agreement will be: ≤0=poor, 0.01 to 0.20=slight, 0.21 to 0.40=fair, 0.41 to 0.60=moderate, 0.61 to 0.80=substantial, and 0.81 to 1=almost perfect.
Data handling and record keeping: Data (De-identified Ultrasound images) will be stored in the Google Cloud in accordance with the Data Protection Act 2018.
Eligible ultrasound images of normal and adenomyotic uterus participants will be extracted as 2D and/or 3D images after removing the patient identifiers. After completion of the study, images stored on Google Cloud will be deleted.
Quality control and assurance: Principles of Good Clinical Practice will be followed throughout the study duration.
Adverse event reporting: Considering the observational nature of the study and no additional step involved in the study design other than baseline TVUS, there are no safety considerations involved.
Ethical Considerations:
Ethics approval has been obtained from the local Institutional Review Board (IRB) and from CARE Fertility research and development team.
Confidentiality and data protection: A unique image identification number will be used for identification of study participants. All the personal data used in the study will be processed in accordance with the Data protection Act 2018 and will be regarded as strictly confidential.
Funding: Funds for this project has been sourced from Tommy's National Centre for Miscarriage Research, Birmingham, United Kingdom.
Conflicts of interest: There are no conflicts of interest associated with this study.
Amendments: Any amendments to the study protocol or documentation will be initiated by the Chief Investigator. All substantial amendments will be submitted to IRB for approval followed by implementation of the changes in all sites involved in the study. Amendments will be documented in the 'Protocol Amendment' section of the study protocol.
End of project definition: Data gathering and preparation, training and evaluation of the model and writing of manuscript will take place over a period of 12 months.
Future study aspects: The long-term aim is to analyse this internally validated AI model for external validation in external centres in the National Health Service (NHS) using PACS system of archived images and correlate it with histopathology of hysterectomy specimens. This would also involve analysis of clinical utility by assessing impact of model on decision making and outcomes. If the model's performance is appropriate in terms of positive predictive value, precision and clinical utility, it will be implemented for real-time usage using online prediction. The model may be deployed for classification requests in form of online application or as an application integrated within ultrasound machines.
Publication policy: The results of the study will be shared through scientific and social media, The study results will be submitted to a peer-reviewed journal within the field of gynaecological ultrasound (Ultrasound in Obstetrics and Gynecology (UOG)) and digital health (The Lancet Digital Health). The manuscript will be prepared by Dr Ishita Mishra and authorship will be determined according to the individual's degree of contribution to the study and writing of the manuscript.
Recruitment & Eligibility
- Status
- ACTIVE_NOT_RECRUITING
- Sex
- Female
- Target Recruitment
- 10000
- Participants: Women attending CARE Fertility centre for ultrasound between February 2022 to February 2024 for any indication and are diagnosed with normal uterus and adenomyosis on ultrasound on screening of images.
- Input data: Good quality and conclusive 2D and/ or 3D images of normal uterus and adenomyotic uterus where the ultrasound characteristics of adenomyosis are clearly visible.
- Participants: Women with co-existing single or multiple intramural fibroids and endometrial cavity abnormalities.
- Input data: Inconclusive ultrasound on assessment by the second reviewer, poor-quality images where the ultrasound characteristics of adenomyosis are unclear and images which cannot be classified into one of the four categories will be excluded.
Study & Design
- Study Type
- OBSERVATIONAL
- Study Design
- Not specified
- Primary Outcome Measures
Name Time Method Precision through study completion, an average of 1 year Precision will indicate how well the model is capturing information and how much it is leaving out. From all the test set of images that were assigned a category, precision indicates how many actually were supposed to be categorised with that label.
Recall through study completion, an average of 1 year From all the test set of images that were assigned a category, recall indicates indicates how many were actually assigned the label.
Average precision through study completion, an average of 1 year The model accuracy will be determined by the area under the precision-recall curve and in Vertex AI, this metric is called average precision. It measures how well the model performs across all score threshold. The closer the score is to 1, the better is the model's performance on the test set.
Number of images correctly identified as per their original classification through study completion, an average of 1 year Images correctly identified as per their original classification label e.g. Mild labelled image of adenomyosis identified as mild. This will be true positive.
Images incorrectly identified and not in line with original classification system e.g. moderate labelled image of adenomyosis classified as mild. This will be false positive
- Secondary Outcome Measures
Name Time Method Interrater agreement through study completion, an average of 1 year Evaluation of interrater agreement between two reviewers in the ultrasound assessment of adenomyosis using Fleiss's kappa for a certain set of images. K-coefficients used for interpretation of the strength of agreement will be: ≤0=poor, 0.01 to 0.20=slight, 0.21 to 0.40=fair, 0.41 to 0.60=moderate, 0.61 to 0.80=substantial, and 0.81 to 1=almost perfect
Time taken to classify all the set of images through study completion, an average of 1 year No of days/hours taken to classify all the images by healthcare professionals and the machine learning framework. This will determine the time saved in assessment of adenomyosis per healthcare professional.
Related Research Topics
Explore scientific publications, clinical data analysis, treatment approaches, and expert-compiled information related to the mechanisms and outcomes of this trial. Click any topic for comprehensive research insights.
Trial Locations
- Locations (1)
CARE Fertility
🇬🇧Birmingham, England, United Kingdom