Elsevier

Controlled Clinical Trials

Volume 21, Issue 6, December 2000, Pages 552-560
Controlled Clinical Trials

Sample-Size Calculations for the Cox Proportional Hazards Regression Model with Nonbinary Covariates

https://doi.org/10.1016/S0197-2456(00)00104-5Get rights and content

Abstract

This paper derives a formula to calculate the number of deaths required for a proportional hazards regression model with a nonbinary covariate. The method does not require assumptions about the distributions of survival time and predictor variables other than proportional hazards. Simulations show that the censored observations do not contribute to the power of the test in the proportional hazards model, a fact that is well known for a binary covariate. This paper also provides a variance inflation factor together with simulations for adjustment of sample size when additional covariates are included in the model. Control Clin Trials 2000;21:552–560

Introduction

In survival analysis, the Cox proportional hazards (PH) regression model assumes that the hazard function λ(t) for the survival time T given the predictors X1, X2, … , Xk has the following regression formulation: logλt|Xλ0t = θ1X1 + θ2X2 + … + θkXkwhere λ0(t) is the baseline hazard. The survival analysis allows the response, the survival time variable t, to be censored. In the use of this model, one often wishes to test the effect of a specific predictor, X1, possibly in the presence of other predictors or covariates, on the response variable. The null hypothesis on the parameter θ1 is H0: θ1, θ2, … , θk = 0, θ2, … , θk tested against alternative [θ* θ2, … , θk]. In the proportional hazards model, θ1 represents the predicted change in log hazards at one unit change in X1 when covariates X2 to Xk are held constant.

When comparing two groups in a univariate model, the group indicator X1 is binary, and θ1 = logΔ is the log hazard ratio of the two groups. Cox proposed testing H0 with the Rao-type statistic, also known as the score statistic [1]. When there is only one binary covariate X1, the score test is the same as the Mantel-Haenszel test and the log-rank test if there are no ties in survival times. It is known that the power of the log-rank test depends on the sample size only through the number of deaths. This simplifies the sample-size formula. For comparing two groups, Schoenfeld derived the following formula: D = Z1−α + Z1−β2 P1−P logΔ2−1where D is the total number of deaths, P is the proportion of the sample assigned to the first treatment group, and Z1−α and Z1−β are standard normal deviates at the desired one-sided significance level α and power 1 − β, respectively [2]. Formula (1) was designed for a randomized comparison of two groups using survival analysis with covariates, although it applies to nonrandomized comparisons as well. In addition to formula (1), there are other sample-size formulas, such as Freedman's and Lakatos' formulas, proposed for the log-rank test to compare two survival distributions 3, 4. Zhen and Murphy derived a formula for a nonbinary covariate assuming exponential survival time [5]. In this paper we derive a sample-size formula for a nonbinary covariate X1 without assuming exponential survival time by generalizing formula (1). However, in our experience, a nonbinary covariate occurs most often in a nonexperimental context such as an epidemiologic study. In this context, one must often adjust for other confounding covariates to appropriately model the effect of X1, the covariate of greatest interest.

Schoenfeld extended the multivariate model power calculation for binary X1 to the case that additional covariates X2Xk are included [2]. His argument depends on the assumption that X1 is independent of X2Xk, as would occur if X1 were randomly assigned in a controlled experiment. We show that Schoenfeld's argument also works when X1 is nonbinary and is independent of X2Xk. This could be relevant in a randomized study if, for example, X1 records dose levels to which the study subjects are randomized. In epidemiologic studies, X1 is often a measure of a risk factor, such as numbers of cigarettes smoked per day, of interest to the investigators, and X2Xk are possible confounders such as age and sex. By definition, covariates X2Xk are correlated with the main factor of interest, X1, and formula (1) doesn't apply. We describe a method for adjusting sample sizes to preserve power when X1 is correlated with X2Xk.

Section snippets

Sample-size method for nonbinary covariates

In a univariate model, without making assumptions about the distributions of covariate X1 and survival time T, the total number of deaths required is given by the following formula, derived in Appendix A: D = Z1−α + Z1−β2 σ2 logΔ2−1where σ2 is the variance of X1 and logΔ = θ* is the log hazards ratio associated with a one-unit change in X1. Formula (2) is similar to formula (1) except that the variance of X1, P(1 − P) in formula (1) is now replaced by a more general term, σ2. The required

Power simulations (for a simple covariate x1)

Formula (2) is based on asymptotic arguments, in the limit of small effect size θ*, for a proportional hazards model with a possibly nonbinary predictor. In this section, these estimated sample sizes are compared with power simulations. For each set of design parameters, enough simulations (6200, 3500, and 1000 for powers of 80, 90, and 95%, respectively) were generated to calculated power, with simulation errors within 1%, using a published simulation program [6]. The simulations of survival

Effect of adjustment for covariates on power

In randomized trials, the baseline variables have no population correlation with the treatment variables. Therefore, the inclusion of baseline variables or nonconfounding covariates in a correctly specified linear model usually improves the precision of the estimate of the treatment parameter by reducing the residual variance in the model. The adjustment for baseline variables in a linear model thus increases the power of the analysis. The covariates also adjust for chance confounding (despite

Variance inflation factor

In a regression model, the variance of the estimate b1 of the parameter θ1 is inversely related to the variance of the corresponding covariate X1. For example, if we increase the scale of X1 by a factor of 10, the variance of X1 will increase by 100 and the variance of b1 will decrease by 100. If covariates explain some of the variance of X1, the same effect results. Let R be the multiple correlation coefficient ρ1.23 … k relating X1 with X2, … , Xk. Then R2 is the proportion of variance

Example

A multiple myeloma data set from an example in the SAS PHGLM procedure is used to illustrate the sample-size calculation 14, 15. In this data set, 65 patients were treated with alkylating agents and, during the study, 17 of the 65 survival times were censored. The data was fitted into a PH model to identify which of the nine prognostic factors are significant predictors.

Let us assume that LOGBUN is the variable X1 of interest. The standard deviation of LOGBUN is σ = 0.3126 and the R2 obtained

Discussion and conclusion

If the PH assumption holds with respect to the full model with k covariates, it may no longer hold if some of the k covariates are left out of the model [12]. In addition, the inclusion of confounding variables may be necessary on scientific grounds to produce meaningful estimates of the effects of the covariate X1. We assume that the model with k covariates is valid with a PH assumption, and we would like to estimate the required sample size for this model. The “naive” sample-size calculation

Acknowledgements

This work was supported by the Department of Veterans Affairs Cooperative Studies Program. The authors wish to thank an associate editor for suggesting the formula to generate survival times from bivariate normal variates.

References (16)

  • K. Akazawa et al.

    Simulation program for estimating statistical power of Cox's proportional hazards model assuming no specific distribution for the survival time

    Computer Methods and Programs in Biomedicine

    (1991)
  • R.G. Miller

    Survival Analysis

    (1981)
  • D.A. Schoenfeld

    Sample-size formula for the proportional-hazards regression model

    Biometrics

    (1983)
  • L.S. Freedman

    Tables of the number of patients required in clinical trials using the logrank test

    Stat Med

    (1982)
  • E. Lakatos

    Sample sizes based on the logrank statistics in complex clinical trials

    Biometrics

    (1988)
  • B. Zhen et al.

    Sample size determination for an exponential survival model with an unrestricted covariate

    Stat Med

    (1994)
  • F.Y. Hsieh

    Sample size tables for logistic regression

    Stat Med

    (1989)
  • F.Y. Hsieh et al.

    A simple method of sample size calculation for linear and logistic regression

    Stat Med

    (1998)
There are more references available in the full text version of this article.

Cited by (259)

  • Type 2 diabetes mellitus and recurrent Tuberculosis: A retrospective cohort in Peruvian military workers

    2024, Journal of Clinical Tuberculosis and Other Mycobacterial Diseases
View all citing articles on Scopus
View full text