Computational methods for case-cohort studies

https://doi.org/10.1016/j.csda.2006.12.028Get rights and content

Abstract

Computational methods, which can be implemented using standard Cox regression software, are given for fitting “exact” pseudolikehood estimates and robust and asymptotic variance estimators from case-cohort data. These methods are based on the computational approach of Therneau and Li [1999. Computing the Cox model for case cohort designs. Lifetime Data Anal. 5, 99–112] but will be less subject to small sample bias. Further, it is shown how to accommodate time-dependent covariates and estimate absolute risk. Extensions to stratified case-cohort sampled data are also provided. The methods are illustrated in analyses of case-cohort samples from a study of radiation exposure from fluoroscopy and breast cancer using SAS software.

Introduction

Since the introduction of methods for the analysis of case-based sampled failure time data (Prentice, 1986), the “case-cohort” design has become increasingly popular in epidemiologic and clinical research. The theory provides for a full range of analytic tools for the Cox proportional hazards model (Cox, 1972) and some of the most important computational challenges have been addressed. In particular, drawing on the theoretic results of Self and Prentice (1988), Barlow (1994) and Lin and Ying (1993), the computational problem of variance estimation was solved by Therneau and Li (1999) (T&L99). However, there are still analytic tools provided by the theory for which practical data analysis methods have not been described. In this paper, we provide methods for a number of outstanding computational problems when the subcohort is a simple random sample including fitting Prentice's pseudolikelihood exactly, the analysis of data with time-dependent covariates, and estimation of absolute risk based on the cumulative hazard with appropriate confidence intervals. Further, we show how the proposed methods can be extended to accommodate stratified random sampling of the subcohort when either analyzing the corresponding stratified or the unstratified Cox model. The methods are easily implemented using standard Cox regression software.

To illustrate the methods, we analyzed simple and age at first exposure stratified case-cohort samples drawn from a cohort of 1741 female patients who were discharged from two tuberculosis sanatoria in Massachusetts between 1930 and 1956 to investigate breast cancer risk and radiation exposure due to radiation exposure due to fluoroscopy (Boice and Monson, 1977, Hrubec et al., 1989). Radiation doses were estimated for those women who received radiation exposure to the chest from the X-ray fluoroscopy lung examination. The remaining women received treatments that did not require fluoroscopic monitoring and were radiation unexposed. Seventy five breast cancer cases were identified with 54 exposed and 21 unexposed. The case-cohort samples we will use to illustrate the methods include subcohorts of 100 sampled subjects and all breast cancer cases who were not sampled. For the simple case-cohort sample, the 100 subjects were randomly sampled without replacement from the entire cohort. For the stratified case-cohort sample, the 100 subjects were sampled from age-at-first-exposure strata (<15, 15–19, 20–29, 30+) in numbers proportional to the number of breast cancer cases in the strata. For clarity of presentation, we illustrate the analysis methods using SAS code that is easy to understand, but is not convenient for general purpose analyses. We have written SAS macros that are more suitable for practical use and made them available at http:hydra.usc.edutimefactors.

Section snippets

Simple (unstratified) case-cohort samples

In this section, we focus on estimation when the subcohort is a simple random sample of size m from a cohort of size n.

Stratified case-cohort studies

Stratification of the subcohort is appealing in order to increase the efficiency of the sample but practical computational methods have not been described. There are two situations requiring different analysis methods. The first is when fitting a stratified Cox model where the strata correspond to the subcohort sampling strata, i.e., the Cox model with separate strata-specific baseline hazards. The second is when the model does not include baseline hazards for each of the subcohort sampling

Discussion

We have described computational methods that deal with a number of practical issues in the analysis of case-cohort studies and provide the full range of analysis options available when analyzing full cohort data; in particular analysis with time-dependent covariates and absolute risk estimation. The methods are easily implemented using standard Cox regression software that compute dfbeta residuals and accommodate an offset in the model.

The main computational problem in the analysis of

References (16)

  • P.K. Andersen et al.

    Statistical Models Based on Counting Processes

    (1993)
  • W. Barlow

    Robust variance estimation for the case-cohort design

    Biometrics

    (1994)
  • O. Bie et al.

    Confidence intervals and confidence bands for the cumulative hazard rate function and their small sample properties

    Scand. J. Statist.

    (1987)
  • J. Boice et al.

    Breast cancer in women after repeated fluoroscopic examinations of the chest

    J. Nat. Cancer Inst.

    (1977)
  • Ø. Borgan et al.

    Exposure stratified case-cohort designs

    Lifetime Data Anal.

    (2000)
  • D.R. Cox

    Regression models and life-tables (with discussion)

    J. Roy. Statist. Soc. B

    (1972)
  • Z. Hrubec et al.

    Breast cancer after multiple chest flouroscopies: second follow-up of Massachusetts women with tuberculosis

    Cancer Res.

    (1989)
  • Jiao, J., 2001. Comparison of variance estimators in case-cohort studies. Ph.D. Dissertation, University of Southern...
There are more references available in the full text version of this article.

Cited by (111)

  • Sugar-containing beverages and their association with risk of breast, endometrial, ovarian and colorectal cancers among Canadian women

    2021, Cancer Epidemiology
    Citation Excerpt :

    Finally, after excluding participants with unusual energy intake (n = 304), and those with a previous history of cancer (n = 121), analyses of colorectal cancer were based on a subcohort of 2633participants (men and women) and 247 incident, colorectal cancer cases. Cox proportional hazards models with the modification for the stratified case-cohort design and with robust variance estimates, as described by Langholz and Jiao [32], were used to estimate hazards ratios (HRs) and 95 % confidence intervals (CIs) for the associations of the exposures of interest with risk of breast, endometrial, ovarian and colorectal cancers. Time from enrollment to diagnosis was used as the time-scale.

  • Lipoprotein(a) and Cardiovascular Risk Prediction Among Women

    2018, Journal of the American College of Cardiology
View all citing articles on Scopus

This work was supported by NCI Grant CA42949 and NIEHS Grant 5P30 ES07048.

View full text