Machine Learning and Pregnancy Success Prediction in Fertility Treatments
- Conditions
- Infertility (IVF Patients)
- Registration Number
- NCT06884930
- Lead Sponsor
- IRCCS San Raffaele
- Brief Summary
Infertility, as defined by the World Health Organization (WHO), is a disorder of the male or female reproductive system characterized by the inability to achieve a clinical pregnancy after 12 months or more of regular, unprotected sexual intercourse. In modern fertility treatment, assisted reproductive technologies (ART), including in vitro fertilization (IVF), have become a standard approach for addressing complex fertility issues and sterility. In Italy, infertility affects approximately 16.5% of couples.
Despite advancements in ART, comparing the failure rates of pregnancies achieved through ART with those of spontaneous pregnancies in Italy reveals significant differences, particularly in terms of success rates, miscarriage rates, and embryo implantation outcomes.
In this context, AI-based models have shown promising potential in predicting IVF success by analyzing complex datasets that include patient demographics, hormonal levels, and embryo morphology. Research indicates that AI can enhance embryo selection, predict the optimal timing for embryo transfer, and advance personalized medicine approaches in reproductive health.
This study aims to use of Machine Learning to identify patterns and factors associated with successful pregnancy outcomes by analyzing large-scale, anonymized ART data. The resulting predictive model could enable clinicians to better personalize treatment protocols for each patient, optimizing medication dosages, timing, and embryo selection. It could also improve pregnancy success rates while reducing the emotional and financial burden on patients, thus advancing the standard of care in ART.
- Detailed Description
This is a multicentric, observational, retrospective, non-profit study, coordinated by the IRCCS San Raffaele Hospital, aims to analyze anonymized data collected between 2019 and 2024 from approximately 5,000 couples undergoing Assisted Reproductive Technology (ART) procedures across three participating centers. The study will examine key variables, including age, medical history, treatment protocols, ART techniques (such as In Vitro Fertilization \[IVF\] and Intracytoplasmic Sperm Injection \[ICSI\]), embryo quality, and pregnancy outcomes, to develop a machine learning-based predictive model for pregnancy outcomes. The selected timeframe ensures a sufficiently large dataset to facilitate robust development and validation of the predictive model.
By leveraging machine learning techniques, this study aims to enhance the accuracy of pregnancy outcome predictions, thereby improving patient counseling and treatment planning in ART procedures. The comprehensive dataset, encompassing a diverse range of variables and a substantial number of cases, will provide a robust foundation for developing a predictive model with high clinical applicability.
The primary objective of this study is to develop a Machine Learning-based predictive model for pregnancy outcomes in assisted reproductive technologies (ART), by analyzing large-scale, anonymized data, for scientific research purposes. The model aims to identify key patterns and factors that correlate with successful pregnancy outcomes to optimize individualized treatment protocols for patients undergoing ART.
SAMPLE SIZE:
The sample size will be approximately 5,000 pairs of subjects (women + men) based on the total number of ART cycles recorded at the participating centers during this period and the number of patients with complete data records that provide sufficient information for analysis. We expect approximately 1650 pairs for the class "success" and 3350 for the class "unsuccess" of the IVF treatment. Thus, the Machine Learning-based predictive model could be trained using a multi-parametric approach with a balanced set of 1350 pairs of subjects, using the remaining couples of subjects to test the performance of the model.
The minimum sample size for the retrospective study should be 295 pairs of subjects, calculated to yield a 95% confidence interval of ± 2.5% around an expected sensitivity of 94% and an expected specificity of 15% of the prediction model, with a prevalence of IVF treatment success of 30% and a dropout rate of 2%. This success prevalence is expected based on the clinical site's experience of the number of IVF treatments; the dropout rate is considered low, at 2%, considering the type of retrospective clinical study using software and the residual possibility of complete data not being valid. Sensitivity, specificity, positive and negative predictive values are calculated with their 95% confidence intervals.
Considering the minimum sample size and the available number of samples, we expect to achieve statistical significance from the hypothesis testing and to obtain a multi-modal signature of predictors of IVF success.
STATISTICAL DESIGN A structured methodology will be employed to develop and assess machine learning models for binary classification tasks, specifically distinguishing between "Success" and "Not Success." The process will commence with the selection of informative and non-redundant features, eliminating those with low variance and high correlation. Subsequently, three distinct classifier models-Random Forest, Support Vector Machines (SVM), and K-Nearest Neighbors (KNN)-will be trained and evaluated using k-fold cross-validation to ensure robust performance assessment. To address potential data imbalances, the ADASYN technique will be applied, generating synthetic samples for the minority class. Model performance will be quantified using various metrics, including accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (ROC-AUC), to identify the most effective model. Finally, a statistical analysis of the most pertinent features will be conducted using non-parametric tests and corrections for multiple comparisons, aiming to elucidate class differences and ensure result reliability.
This structured approach will ensure that the models are meticulously tuned and validated through rigorous testing and analysis, leading to accurate and reliable machine learning models for binary classification tasks.
INFORMED CONSENT AND DATA PROTECTION In accordance with data protection regulations, the study will utilize anonymized data previously collected through routine clinical practice and stored in MedITEX IVF, a management software used at the participating assisted reproduction centers. No direct patient interaction or intervention will occur as part of the study. All data will be anonymized following best practice guidelines to ensure patient confidentiality, adhering to ethical standards and applicable data privacy regulations.
The Investigator (or the Center receiving the data) commits to processing the data solely for the purposes of the study, storing it in a secure network system, and restricting access to authorized personnel who have undertaken confidentiality agreements. If external suppliers are involved, they will be appointed as Data Processors with appropriate agreements in place. The Investigator will also facilitate the exercise of data subject rights, including access, rectification, cancellation, limitation, opposition, and portability, within 30 days of receiving the relevant request. In the event of data communication outside the institution in pseudonymized form, efforts will be made to prevent the identification of data subjects. Within 30 days following the end of the study, the Investigator will ensure the deletion or irreversible anonymization of the communicated data and promptly communicate this in writing.
A study-specific Data Protection Impact Assessment (DPIA), reviewed by the Data Protection Officer (DPO) of the coordinating institution, has been conducted in accordance with applicable data protection laws.
Recruitment & Eligibility
- Status
- NOT_YET_RECRUITING
- Sex
- All
- Target Recruitment
- 5000
- Patients who underwent ART procedures, including IVF and ICSI, between 2019 and 2024.
- Women aged between 18 and 43 years.
- Patiens with incomplete or missing data records that do not provide sufficient information for analysis.
- women outside the 18 to 43 age range
Study & Design
- Study Type
- OBSERVATIONAL
- Study Design
- Not specified
- Primary Outcome Measures
Name Time Method Pregnancy rate Data will be extracted for all ART cycles conducted between 2019 and 2024 to allow for the comprehensive development of the Machine Learning-based model. The primary endpoint of the study will be the clinical pregnancy defined as a pregnancy confirmed by an increasing level of hCG and the presence of a gestational sac or heartbeat detected by ultrasound.
- Secondary Outcome Measures
Name Time Method
Related Research Topics
Explore scientific publications, clinical data analysis, treatment approaches, and expert-compiled information related to the mechanisms and outcomes of this trial. Click any topic for comprehensive research insights.