CriteriaMapper Automates Clinical Trial Cohort Identification Using EHR and NLP
- A new system called CriteriaMapper automates the identification of clinical trial cohorts by integrating data from electronic health records (EHR) and natural language processing (NLP).
- The system normalizes eligibility criteria (EC) attributes and clinical characteristics across seven domains, including conditions, procedures, lab tests, therapies, biomarkers, observations, and diagnosis modifiers.
- CriteriaMapper uses rule-based knowledge engineering and standard terminologies to map EC attributes to clinical characteristics in EHRs, enhancing the efficiency and accuracy of patient matching for clinical trials.
- The system's performance was evaluated using inter-rater agreement and comparisons against a gold standard derived from EHR data, demonstrating its reliability for clinical phenotyping.
CriteriaMapper, a novel system leveraging electronic health records (EHR) and natural language processing (NLP), has been developed to automate the identification of clinical trial cohorts. This tool addresses the critical need for efficient and accurate patient matching in clinical trials, potentially accelerating pharmaceutical research and development.
The CriteriaMapper system integrates data from ClinicalTrials.gov and EHR data from GeneDx (Sema4) data warehouse, which includes the Mount Sinai Data Warehouse (MSDW) encompassing approximately 3.9 million patients, and VieCure, a clinical decision support platform with 79,457 de-identified patients. The system focuses on normalizing eligibility criteria (EC) attributes and clinical characteristics across seven clinical domains: condition, procedure, lab test, therapy, biomarker, observation, and diagnosis modifier.
The CriteriaMapper system comprises three key components: rule-based knowledge engineering, normalization of EC attributes and clinical characteristics, and a clinical phenotyping knowledge base. Therapy-related data is classified into treatment, regimen, modality, mechanism of action (MOA), and medication, mapped using resources like Cancer Alteration Viewer (CAV) and disease treatment guidelines.
Normalization involves standardizing attributes using codes like ICD, CPT, LOINC, and RxNorm. For lab tests, the system considers factors like system, quantity, time, scale type, and method to map to LOINC codes. Medications are normalized by retrieving synonyms from UMLS Metathesaurus and RxNorm. The system also addresses challenges in exact matching between EC attributes and clinical characteristics by mapping attributes at different levels and accounting for additional details in standard terminologies.
Quality assurance was conducted on annotated and normalized EC attributes and EHR clinical characteristics in the condition, procedure, lab test, and therapy domains. The annotation from the Redshift database was assessed by two curators, with inter-rater agreement gauged by Cohen’s Kappa coefficient. The system's performance was also compared against a gold standard derived from EHR data, measuring precision, recall, and F1-score metrics.

Stay Updated with Our Daily Newsletter
Get the latest pharmaceutical insights, research highlights, and industry updates delivered to your inbox every day.
Related Topics
Reference News
[1]
CriteriaMapper: establishing the automatic identification of clinical trial cohorts from ... - Nature
nature.com · Oct 25, 2024
Data from ClinicalTrials.gov, EHR (GeneDx), MSDW, and VieCure were used, covering 3.9 million patients and 79,457 de-ide...