Elsevier

The Lancet

Volume 365, Issue 9467, 9–15 April 2005, Pages 1348-1353
The Lancet

Series
Sample size calculations in randomised trials: mandatory and mystical

https://doi.org/10.1016/S0140-6736(05)61034-3Get rights and content

Summary

Investigators should properly calculate sample sizes before the start of their randomised trials and adequately describe the details in their published report. In these a-priori calculations, determining the effect size to detect—eg, event rates in treatment and control groups—reflects inherently subjective clinical judgments. Furthermore, these judgments greatly affect sample size calculations. We question the branding of trials as unethical on the basis of an imprecise sample size calculation process. So-called underpowered trials might be acceptable if investigators use methodological rigor to eliminate bias, properly report to avoid misinterpretation, and always publish results to avert publication bias. Some shift of emphasis from a fixation on sample size to a focus on methodological quality would yield more trials with less bias. Unbiased trials with imprecise results trump no results at all. Clinicians and patients deserve guidance now.

Section snippets

Components of sample size calculations

Calculating sample sizes for trials with dichotomous outcomes (eg, sick vs well) requires four components: type I error (α), power, event rate in the control group, and a treatment effect of interest (or analogously an event rate in the treatment group). These basic components persist through calculations with other types of outcomes, except other assumptions can be necessary. For example, with quantitative outcomes and a typical statistical test, investigators might assume a difference between

Effect of selecting α error and power

The conventions of α=0·05 and power=0·80 usually suffice. However, other assumptions make sense based on the topic studied. For example, if a standard prophylactic antibiotic for hysterectomy is effective with few side-effects, in a trial of a new antibiotic we might set α error lower (eg, 0·01) to reduce the chances of a false-positive conclusion. We might even consider lowering the power below 0·80 because of our reduced concern about missing an effective treatment—an effective safe treatment

Estimation of population parameters

For some investigators, estimation of population parameters—eg, event rates in the treatment and control groups—has mystical overtones. Some researchers scoff at this notion, since estimating the parameters is the aim of the trial: needing to do it before the trial seems ludicrous. The key point, however, is that they are not estimating the population parameters per se but the treatment effect they deem worthy of detecting. That is a big difference.

Usually, investigators start by estimating the

Low power with limited available participants

What happens when sample size software—in view of an investigator's diligent estimates—yields a trial size that exceeds the number of available participants? Frequently, investigators then calculate backwards and estimate that they have low power (eg, 0·40) for their available participants. This practice may be more the rule than the exception.9

Some methodologists advise clinicians to abandon such a low-power study. Many ethics review boards deem a low power trial unethical.10, 11, 12 Chalmers'

Sample size samba

Investigators sometimes perform a “sample size samba” to achieve adequate power.27, 28 The dance involves retrofitting of the parameter estimates (in particular, the treatment effect worthy of detection) to the available participants. This practice seems fairly common in our experience and in that of others.27 Moreover, funding agencies, protocol committees, and even ethics review boards might encourage this backward process. It represents an operational solution to a real problem. In view of

Sample size modification

With additional available participants and resource flexibility, investigators could consider a sample size modification strategy, which would alleviate some of the difficulties with rough guesses used in the initial sample size calculations. Usually, modifications lead to increased sample sizes,29 so investigators should have access to the participants and the funding to accommodate the modifications.

Approaches to modification rely on revision of the event rate, the variance of the endpoint,

Futility of post hoc power calculations

A trial yields a treatment effect and confidence interval for the results. The power of the trial is expressed in that confidence interval. Hence, the power is no longer a meaningful concern.7, 27, 34 Nevertheless, after trial completion, some investigators do power calculations on statistically non-significant trials using the observed results for the parameter estimates. This exercise has specious appeal, but tautologically yields an answer of low power.7, 27 In other words, this ill-advised

What should readers look for in sample size calculations?

Readers should find the a-priori estimates of sample size. Indeed, in trial reports, confidence intervals appropriately indicate the power. However, sample size calculations still provide important information. First, they specify the primary endpoint, which safeguards against changing outcomes and claiming a large effect on an outcome not planned as the primary outcome.35 Second, knowing the planned size alerts readers to potential problems. Did the trial encounter recruitment difficulties?

Conclusions

Statistical power is an important notion, but it should be stripped of its ethical bellwether status. We question the branding of trials as unethical based solely on an inherently subjective, imprecise sample size calculation process. We endorse planning for adequate power, and we salute large multicentre trials of the ISIS-2 ilk;43 indeed, more such studies should be undertaken. However, if the scientific world insisted solely on large trials, many unanswered questions in medicine would

References (43)

  • KF Schulz et al.

    Generation of allocation sequences in randomised trials: chance, not choice

    Lancet

    (2002)
  • KF Schulz et al.

    Unequal group sizes in randomised trials: guarding against guessing

    Lancet

    (2002)
  • DG Altman et al.

    The revised CONSORT statement for reporting randomized trials: explanation and elaboration

    Ann Intern Med

    (2001)
  • JA Freiman et al.

    The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial: survey of 71 “negative” trials

    N Engl J Med

    (1978)
  • DL Sackett et al.

    Can we learn anything from small trials?

    Ann N Y Acad Sci

    (1993)
  • SJ Pocock

    Clinical trials: a practical approach

    (1983)
  • CL Meinert

    Clinical trials: design, conduct, and analysis

    (1986)
  • S Piantadosi

    Clinical trials: a methodologic perspective

    (1997)
  • SK Sinei et al.

    Preventing IUCD-related pelvic infection: the efficacy of prophylactic doxycycline at insertion

    Br J Obstet Gynaecol

    (1990)
  • JN Matthews

    Small clinical trials: are they all bad?

    Stat Med

    (1995)
  • SD Halpern et al.

    The continuing unethical conduct of underpowered clinical trials

    JAMA

    (2002)
  • Cited by (418)

    View all citing articles on Scopus
    View full text