MedPath

Machine Learning Model Predicts Colorectal Cancer Prognosis Based on CD4+ T Cell Genes

9 months ago3 min read

Key Insights

  • A novel machine learning model using eight CD4+ T cell-related genes (CD4TGs) accurately predicts colorectal cancer (CRC) patient prognosis.

  • The model identifies high-risk patients with poorer outcomes, validated across TCGA and GEO datasets, showing consistent and stable risk stratification.

  • High-risk patients exhibit distinct immune profiles, including increased immune cell infiltration and elevated expression of immune checkpoint genes.

A new machine learning model leveraging the expression of eight genes associated with CD4+ conventional T cells (CD4Tconv) has demonstrated significant accuracy in predicting the prognosis of colorectal cancer (CRC) patients. The study, published in Scientific Reports, highlights the potential of this model to identify high-risk individuals who may benefit from more aggressive treatment strategies.
The research team analyzed single-cell sequencing data from CRC samples, identifying distinct immune cell subtypes, including CD4Tconv cells. These cells play a crucial role in T cell-mediated immune responses, expressing signature genes like CD3D, CD3E, and CD4. Further analysis revealed 172 differentially expressed genes (DEGs) associated with CD4Tconv cells in CRC, with IFNG and TNF identified as the most interconnected core genes.

Prognostic Model Development and Validation

Univariate Cox regression analysis identified eight genes (HSPA1A, CXCR5, CTSD, PTGER2, FGF12, APOD, TP63, and LGALS4) whose differential expression significantly correlated with CRC patient outcomes. These genes were then integrated into a machine learning ensemble approach, utilizing leave-one-out cross-validation (LOOCV) to develop 101 models. The Elastic Net model (Enet with α = 0.8) emerged as the most effective, achieving an average C-index of 0.604.
Risk scores (RS) were calculated for each patient based on the expression of the eight genes. Patients were categorized into high- and low-risk groups using a cutoff of zero. Kaplan-Meier survival curves demonstrated that high-risk patients had significantly worse prognoses in both the TCGA training set and the GEO validation set, confirming the model's consistency and stability.

Clinical Relevance and Immune Microenvironment

Further analysis revealed that age, cancer staging, and RS significantly influenced survival rates (P < 0.05). ROC curve analysis showed that the risk score had the highest predictive power, with an AUC value of 0.705. Calibration analysis affirmed the model's robustness in forecasting survival rates at 1, 3, and 5 years, with a C-index of 0.781 (95% confidence interval: 0.733–0.829).
GSEA analysis identified distinct biological pathways in high- and low-risk groups. The high-risk group showed enrichment of gene sets linked to the immune response and extracellular matrix remodeling, while the low-risk group exhibited enrichment of gene sets related to intracellular metabolism and protein synthesis.
Immune infiltration analysis revealed a positive association between immune cell infiltration and RS, particularly with macrophages (M0, M1, M2 subtypes), CD4+ T cells, and CD8+ T cells. Conversely, a negative correlation was observed for B cell plasma. Expression levels of immune checkpoint-related genes, including CD274 (PD-L1), CTLA4, and PDCD1 (PD-1), were markedly elevated in the high-risk group.

Chemotherapy Sensitivity

Sensitivity analysis of various chemotherapy drugs showed that the low-risk group had lower maximum median inhibitory concentrations (IC50), indicating higher sensitivity. Notably, oxaliplatin, a commonly used chemotherapy drug in CRC, exhibited greater sensitivity in the low-risk group.

Gene Expression Validation

qRT-PCR and IHC techniques confirmed the differential expression of the eight genes in CRC tissues. APOD and TP63 levels were increased in cancerous tissues, while HSPA1A, CXCR5, CTSD, PTGER2, FGF12, and LGALS4 levels were elevated in normal tissues.
These findings suggest that the machine learning model based on CD4Tconv-related genes can effectively predict CRC prognosis and identify patients who may benefit from tailored treatment strategies, including chemotherapy and immunotherapy.
Subscribe Icon

Stay Updated with Our Daily Newsletter

Get the latest pharmaceutical insights, research highlights, and industry updates delivered to your inbox every day.

MedPath

Empowering clinical research with data-driven insights and AI-powered tools.

© 2025 MedPath, Inc. All rights reserved.