MedPath

Lung Cancer Multi-omics Digital Human Avatars for Integrating Precision Medicine Into Clinical Practice

Not yet recruiting
Conditions
Non Small Cell Lung Cancer
Interventions
Procedure: surgical resection
Registration Number
NCT05802771
Lead Sponsor
Fondazione Policlinico Universitario Agostino Gemelli IRCCS
Brief Summary

The goal of this multi-centric observational clinical trial is to to develop accurate predictive models for lung cancer patients, through the creation of Digital Human Avatars using various omics-based variables and integrating well-established clinical factors with "big data" and advanced imaging features

The main goals of LANTERN project are:

* To develop prevention models for early lung cancer diagnosis;

* To set up personalized predictive models for individual-specific treatments;

Lung cancer patients will be prospectively enrolled and main omics data (including radiomics and genomics) will be collected, reflecting the main omics domains associated with the lung cancer diagnosis and decision making pathway.

An exploratory analysis across all collected datasets will select a pool of potential biomarkers to create a multiple distinct multivariate models, trained though advanced machine learning (ML) and AI techniques sub-divided into specific areas of interest. Finally, the developed predictive models will be validated in order to test their robustness, transferability and generalizability, leading to the development of the Digital Human Avatar.

Detailed Description

Patient enrolment and omics data collection The objective of this WP is to gather information from all the clinical and omics based data sources considered as clinically significant for decision support in the lung cancer comprehensive diagnosis and therapy workflow. A structured terminological system will be developed for prospective data collection through specific Case Report Forms (CRFs).

Patients will be enrolled by the dedicated research enrolment centres and data obtained from the five omics-based variables, will be collected and recorded in a secure database.

Omics data archiving and inter-actionability The main aim of this WP is to allow complete data integration into both existing and new archiving systems and to ensure an easy and effective use and sharing of collected omics data.

All collected data representing the different considered omics-domains will be recorded according to a shared common ontology. The shared general ontology will represent a structured terminological system for data archiving and analysis where all the different omics domains will be recorded in a specific eCRF, ensuring coherence for all the collected data variables. Finally, the collected omics-related data will then undergo radiomic analysis and radiomic features will then be extracted.

Omics data modelling, Digital Human avatar (DHA) creation and validation

This WP is focused on developing accurate predictive models (by creating Digital Human Avatars (DHA)) and on their validation. The purpose of this WP is to identify effective primary biomarkers, harmonize them through compact statistical models and subsequently creating patient-specific DHAs which will be unique to each patient. We plan to integrate all the aforementioned omics data into predictive models that will represent the basis for a fully personalized and innovative lung cancer integrated decision support system. This WP is divided into three phases:

Phase 1: Omics features identification and selection Phase 2: Predictive model development and DHA creation Phase 3: Predictive model and DHA validation

Omics features identification and selection:

In the first step, an exploratory analysis across all collected datasets from an estimate of ≈ 240 NSCLC patients will enable the start of the biomarker identification process and restrict the cast amount of information towards a more selected pool of potential biomarkers. This first phase will employ robust data analysis techniques in order to identify relevant variables in a univariate setting, taking individual statistical distributions, feature-relevant correlations and general descriptive statistics into account.

Predictive model development and DHA creation:

The objective of the second phase is to create multiple distinct but modular multivariate models which will be trained through advanced ML and AI techniques, segmented into specific modular areas of interest and the subsequent creation of the DHA. Different supervised models will be developed including logistic regression, decision tree, support vector machine, random forest, XGBoost classifier, and artificial neural networks. The k-fold cross-validation will be used for hyperparameters tuning and statistical significance comparison of the performance of the ML models will be performed. This will be done to evaluate predictive performances based on accuracy (number of subjects correctly classified on the total number of patients) and precision (true positive on total test positive, recall (sensitivity), F1 score (2\*precision\*recall/(precision+recall)) and AUC-ROC.

The DHA creation will involve the integration of specific algorithms into the data extraction pipeline to clean and restructure the flow of data, while applying text mining and natural language processing technologies to the unstructured texts. The results of this pre-processing will then be recoded through a specifically assigned ontology to reveal duplicates. This leads to the creation of data Marts which will be updated continuously and automatically with new data. Based on the available data already processed, the developed algorithm and its underlying infrastructure will be used to classify newly updated patient data inputs by the clinicians using the interface. The resulting data presented through the dynamic interface allows the thorough exploration of previously added patient data already present in the database, to infer the best course of action based on historical data and the experience of the clinician. This will lead to a more generalized exploration workflow that will act as a hypothesis generator for the user, through clustering information based on custom criteria, thereby generating an exploratory analysis of the available data.

The investigators estimate that approximately 300 NSCLC cases with complete data will be adequate to start this process. Both user friendliness and model explainability will serve as the primary standard of the model development strategies. Easily interpretable values such as SHAP (SHapley Additive exPlanations) values will be attached to each model in order to avoid any black-box approaches that might render model outputs difficult to explain to the patients during their interactions with the clinicians.

Predictive model and DHA validation: Both the developed model and the comprehensive DHA will be validated in order to test their robustness, transferability and generalizability. Two consecutive validation strategies will be employed respectively: the internal and external validation techniques. We estimate a total number of approximately 420 NSCLC cases to start the validation process. This process will include both internal and external validation.

Recruitment & Eligibility

Status
NOT_YET_RECRUITING
Sex
All
Target Recruitment
600
Inclusion Criteria
  • Patients with (suspected) NSCLC
  • Age >18 yrs
  • ECOG 0-3
  • Written Informed Consent
Read More
Exclusion Criteria
  • ECOG 4
  • Psychosocial, or emotional conditions controindicating participation to the study
Read More

Study & Design

Study Type
OBSERVATIONAL
Study Design
Not specified
Arm && Interventions
GroupInterventionDescription
enrolled patientssurgical resectionNon small cell lung cancer patients underwent surgical resection. We will use part of this cohort to built the predictive models and a second part to validate the creted models.
Primary Outcome Measures
NameTimeMethod
To develop prevention models for early lung cancer diagnosis36 months

Development of prognostic model in NSCLC patients using omics data. In particular, will be determinate the association between radiomics characteristics and biomarkers to lung cancer stage and survival outcome.

Omics data and prognostic model will be tested in terms of disease free and overall survival comapring the different models.

Secondary Outcome Measures
NameTimeMethod
© Copyright 2025. All Rights Reserved by MedPath