Computational methods for case-cohort studies☆
Introduction
Since the introduction of methods for the analysis of case-based sampled failure time data (Prentice, 1986), the “case-cohort” design has become increasingly popular in epidemiologic and clinical research. The theory provides for a full range of analytic tools for the Cox proportional hazards model (Cox, 1972) and some of the most important computational challenges have been addressed. In particular, drawing on the theoretic results of Self and Prentice (1988), Barlow (1994) and Lin and Ying (1993), the computational problem of variance estimation was solved by Therneau and Li (1999) (T&L99). However, there are still analytic tools provided by the theory for which practical data analysis methods have not been described. In this paper, we provide methods for a number of outstanding computational problems when the subcohort is a simple random sample including fitting Prentice's pseudolikelihood exactly, the analysis of data with time-dependent covariates, and estimation of absolute risk based on the cumulative hazard with appropriate confidence intervals. Further, we show how the proposed methods can be extended to accommodate stratified random sampling of the subcohort when either analyzing the corresponding stratified or the unstratified Cox model. The methods are easily implemented using standard Cox regression software.
To illustrate the methods, we analyzed simple and age at first exposure stratified case-cohort samples drawn from a cohort of 1741 female patients who were discharged from two tuberculosis sanatoria in Massachusetts between 1930 and 1956 to investigate breast cancer risk and radiation exposure due to radiation exposure due to fluoroscopy (Boice and Monson, 1977, Hrubec et al., 1989). Radiation doses were estimated for those women who received radiation exposure to the chest from the X-ray fluoroscopy lung examination. The remaining women received treatments that did not require fluoroscopic monitoring and were radiation unexposed. Seventy five breast cancer cases were identified with 54 exposed and 21 unexposed. The case-cohort samples we will use to illustrate the methods include subcohorts of 100 sampled subjects and all breast cancer cases who were not sampled. For the simple case-cohort sample, the 100 subjects were randomly sampled without replacement from the entire cohort. For the stratified case-cohort sample, the 100 subjects were sampled from age-at-first-exposure strata (, 15–19, 20–29, ) in numbers proportional to the number of breast cancer cases in the strata. For clarity of presentation, we illustrate the analysis methods using SAS code that is easy to understand, but is not convenient for general purpose analyses. We have written SAS macros that are more suitable for practical use and made them available at http:hydra.usc.edutimefactors.
Section snippets
Simple (unstratified) case-cohort samples
In this section, we focus on estimation when the subcohort is a simple random sample of size m from a cohort of size n.
Stratified case-cohort studies
Stratification of the subcohort is appealing in order to increase the efficiency of the sample but practical computational methods have not been described. There are two situations requiring different analysis methods. The first is when fitting a stratified Cox model where the strata correspond to the subcohort sampling strata, i.e., the Cox model with separate strata-specific baseline hazards. The second is when the model does not include baseline hazards for each of the subcohort sampling
Discussion
We have described computational methods that deal with a number of practical issues in the analysis of case-cohort studies and provide the full range of analysis options available when analyzing full cohort data; in particular analysis with time-dependent covariates and absolute risk estimation. The methods are easily implemented using standard Cox regression software that compute dfbeta residuals and accommodate an offset in the model.
The main computational problem in the analysis of
References (16)
- et al.
Statistical Models Based on Counting Processes
(1993) Robust variance estimation for the case-cohort design
Biometrics
(1994)- et al.
Confidence intervals and confidence bands for the cumulative hazard rate function and their small sample properties
Scand. J. Statist.
(1987) - et al.
Breast cancer in women after repeated fluoroscopic examinations of the chest
J. Nat. Cancer Inst.
(1977) - et al.
Exposure stratified case-cohort designs
Lifetime Data Anal.
(2000) Regression models and life-tables (with discussion)
J. Roy. Statist. Soc. B
(1972)- et al.
Breast cancer after multiple chest flouroscopies: second follow-up of Massachusetts women with tuberculosis
Cancer Res.
(1989) - Jiao, J., 2001. Comparison of variance estimators in case-cohort studies. Ph.D. Dissertation, University of Southern...
Cited by (111)
Sugar-containing beverages and their association with risk of breast, endometrial, ovarian and colorectal cancers among Canadian women
2021, Cancer EpidemiologyCitation Excerpt :Finally, after excluding participants with unusual energy intake (n = 304), and those with a previous history of cancer (n = 121), analyses of colorectal cancer were based on a subcohort of 2633participants (men and women) and 247 incident, colorectal cancer cases. Cox proportional hazards models with the modification for the stratified case-cohort design and with robust variance estimates, as described by Langholz and Jiao [32], were used to estimate hazards ratios (HRs) and 95 % confidence intervals (CIs) for the associations of the exposures of interest with risk of breast, endometrial, ovarian and colorectal cancers. Time from enrollment to diagnosis was used as the time-scale.
Triglyceride-Rich Lipoprotein Cholesterol, Small Dense LDL Cholesterol, and Incident Cardiovascular Disease
2020, Journal of the American College of CardiologyLipoprotein(a) and Cardiovascular Risk Prediction Among Women
2018, Journal of the American College of Cardiology
- ☆
This work was supported by NCI Grant CA42949 and NIEHS Grant 5P30 ES07048.