The use of site-stratified Cox regression analysis in the PANAMO phase III study for COVID-19 mortality revealed significant limitations, leading to data attrition and a compromised p-value. The study, which investigated the impact of vilobelimab on mortality, encountered issues when adjusting for site stratification due to heterogeneity across sites.
Impact of Site Stratification on Data Analysis
When adjusting for site stratification in Cox regression analysis, the method requires calculation of site-specific risk sets separately for each site to reflect heterogeneous baseline hazards across sites. This approach led to the exclusion of data from sites with no events (deaths) or sites containing only one enrolled patient, as these sites did not contribute to the formation of the partial likelihood. In the PANAMO study, this affected 61 patients (16.6% of total enrollment), including 55 patients from sites with no deaths and 6 patients from single-patient sites who died.
Consequence of Data Exclusion
The exclusion of these 61 patients introduced bias, as all six deaths from single-patient sites were in the placebo group. Removing these patients from the data analysis caused an underestimation of the treatment effect of vilobelimab. This hidden bias, resulting from a reduced effective sample size and unbalanced treatment allocation, pushed the p-value above the significance level.
Alternative Analytical Approaches
Analyzing the data set with the originally proposed protocol method using Cox regression without site stratification reported a positive finding with a hazard ratio (HR) of 0.67 and a p-value of 0.026. This method was adopted by the FDA in its published review as the more reliable approach.
Addressing Geographic Diversity and Population Heterogeneity
To account for geographic diversity and population heterogeneity without the technical challenges posed by local risk sets, country-level or region-level stratification may be more appropriate. Healthcare system differences between countries may significantly impact mortality due to variations in intensive care treatment modalities, drug approvals, and staffing. Fitting country-stratified or region-stratified Cox models, as well as multilevel frailty Cox models with random effects, yielded positive findings with hazard ratios in similar ranges. Consistent patterns were observed in pre-specified sensitivity analyses using logistic regression and post-hoc simple group comparisons via log-rank tests, as well as for the key secondary endpoint of 60-day all-cause mortality.