Article Text

Download PDFPDF

Cardiovascular disease risk assessment in older women: can we improve on Framingham? British Women’s Heart and Health prospective cohort study
  1. M May1,
  2. D A Lawlor1,
  3. P Brindle1,
  4. R Patel1,
  5. S Ebrahim2
  1. 1Department of Social Medicine, University of Bristol, Bristol, UK
  2. 2Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK
  1. Correspondence to:
    Margaret May
    Department of Social Medicine, University of Bristol, Canynge Hall, Whiteladies Road, Bristol BS8 2PR, UK; m.t.may{at}


Objectives: To develop a cardiovascular risk assessment tool that is feasible and easy to use in primary care (general practice (GP) model).

Design: Prospective cohort study.

Setting: 23 towns in the United Kingdom.

Participants: 3582 women aged 60 to 79 years who were free of coronary heart disease (CHD) at entry into the British Women’s Heart and Health Study.

Main outcome measures: Predictive performance of a GP model compared with the standard Framingham model for both CHD and cardiovascular disease (CVD).

Results: The Framingham tool predicted CHD events over 5 years accurately (predicted 5.7%, observed 5.5%) but overpredicted CVD events (predicted 10.5%, observed 6.8%). In higher-risk groups, Framingham overpredicted both CHD and CVD events and was poorly calibrated for this cohort. Including C-reactive protein and fibrinogen with standard Framingham risk factors did not improve discrimination of the model. The GP model, which used age, systolic blood pressure, smoking habit and self-rated health (all of which can be easily obtained in one surgery visit) performed as well as the Framingham risk tool: area under the receiver operating curve discrimination statistic was 0.66 (95% confidence interval (CI) 0.62 to 0.70) for CHD and 0.67 (95% CI 0.64 to 0.71) for CVD compared with 0.65 (95% CI 0.61 to 0.68) and 0.66 (95% CI 0.62 to 0.69) for the corresponding Framingham models.

Conclusions: An alternative risk assessment based on only a simple routine examination and a small number of pertinent questions may be more useful in the primary care setting. This model appears to perform well but needs to be tested in different populations.

  • AUROC, area under the receiver operating characteristic curve
  • CHD, coronary heart disease
  • CVD, cardiovascular disease
  • GP, general practice

Statistics from

Primary prevention of cardiovascular disease (CVD) involves the identification of people at greatest risk and the targeting of advice and treatments at this group. Most methods use the Framingham score,1 which requires blood tests and an ECG. Risk scoring has been dominated by the accuracy of model prediction and not by feasibility of application in primary care, where most primary prevention takes place. Furthermore, aetiological factors that may not necessarily be the best prognostic factors or those that are most readily modified have been overemphasised.

Our main objective was to construct a primary care model that would work as well as the Framingham score but would not require an ECG or blood tests. An assessment that requires the primary care professional to undertake only simple routine examinations and ask a few pertinent questions would have obvious advantages over one requiring blood tests and a more time-consuming examination. Such an assessment could be carried out in the surgery with the results, and hence the ability to begin an appropriate prevention plan, immediately available when the patient has been motivated to attend.

A secondary objective was to explore the addition of C-reactive protein and fibrinogen concentrations to the Framingham model, as some authors have claimed that this would improve its performance.2–4


The British Women’s Heart and Health Study

Full details of the selection of participants and measurements used in the study have been reported.5,6 Between 1999 and 2001, 4286 women aged 60–79 years who were randomly selected from 23 British towns were interviewed and examined, completed medical questionnaires and had detailed reviews of their medical records.5 These women have been followed up over a median of 4.7 years by flagging with the National Health Service (NHS) central register for mortality data and two-yearly review of their medical records. Local ethics committees’ approvals were obtained for the study.

Assessment of prevalent and incident coronary heart disease

Methods used at baseline assessment have been described.5,6 Prevalent coronary heart disease (CHD) at baseline was defined as a woman with either of the following: (1) a medical record of a myocardial infarction (defined according to World Health Organization criteria), angina, coronary artery bypass or angioplasty; or (2) a self-report that a doctor had ever diagnosed a heart attack or angina. Prevalent stroke and diabetes were similarly defined as a medical record of a diagnosis or a self-report of a doctor’s diagnosis. Including both self-reports and data from medical records provides greater assurance that all women with baseline disease have been excluded in the prospective analyses. Incident cases of CHD in women who were free of prevalent disease at baseline were defined as either death with an underlying or contributing cause of CHD (International classification of diseases, 10 revision codes I20–I25, I51.6) or a myocardial infarction, diagnosis of angina or coronary artery bypass or angioplasty identified in the follow-up medical record review. Incident CVD (CHD or stroke) was similarly defined as freedom from disease at baseline and either death with an underlying or contribution cause coded as I20–I25, I51.6, I60–I69 or G45 or a new CHD or stroke event in any woman’s medical record review. The censoring date for events was 31 December 2004.

Measurement of predictors

Information on smoking and self-rated general health, which was assessed with four prespecified categories—excellent, good, fair and poor—were obtained from the nurse interview or self-completed questionnaire at baseline.5,6 Blood samples were taken after a minimum 6 h fast. These samples were used to assess lipids by standard procedures.6 Fibrinogen was assayed in stored citrated plasma by the Clauss assay in an MDA-180 automated coagulometer (Organon Teknika). C-reactive protein was assessed by a high-sensitivity immunonephelometric assay on a ProSpec protein analyser (Dade-Behring) as previously described.7 Blood pressure, height and weight (used to calculate body mass index) were measured by standard procedures.6 A resting 12-lead ECG was recorded for each woman at baseline assessment with left ventricular hypertrophy defined according to Minnesota codes (3-1 or 3-3).8

Statistical methods

Assessment of the performance of different risk models

We assessed both the calibration and discrimination of the prognostic models.9 Calibration measures the accuracy of the model predictions and was assessed by comparing mean predicted risk with observed incidence of the outcome overall and in groups of participants classified by level of risk. Discrimination is the ability to rank subjects in order of risk such that those who experience the event of interest have a higher predicted risk than those who do not experience it. If discrimination is poor, then people above the threshold will receive treatment unnecessarily and people below the threshold will experience CVD but will not receive preventive treatment. If calibration is poor then the risk thresholds for treatment will be set higher or lower than they should be to achieve the primary prevention targets.

Test of the predictions from the published Framingham models

The published Framingham equations1 for CHD and CVD events were used to predict risk of an event occurring during the time between the baseline survey and the first follow up for each woman. The predicted risk was used to rank the women and classify them by fifths of risk. The mean predicted risk for each fifth and the overall risk were compared with the corresponding observed number of events. The ratio of predicted risk to observed incidence was calculated to see whether the Framingham score overpredicted or underpredicted risk in our study. Receiver operating characteristic analysis, plotting sensitivity and 1 − specificity of risk predictions (see appendix), was used to assess the discrimination of the risk score.

Comparison between the general practice model and the Framingham model

Weibull proportional hazards survival models and standard methods for assessing calibration and accuracy of model predictions were used to see whether non-Framingham risk factors for CVD could be used either in addition to or instead of the standard Framingham risk factors. Multiple imputation methods were used to deal with missing data, facilitating comparisons across the different models (see appendix).


After removal of women with baseline CHD or CVD, 3582 were available for the analyses with CHD (704 women with baseline disease removed) as the outcome and 3511 with CVD (775 women with baseline disease removed) as the outcome. Table 1 shows the mean (SD) (or proportions) of the risk factors in the 3582 women together with the number of women with imputed data.

Table 1

 Risk factors for coronary heart disease in 3582 women

No variables had more than 11% imputed data and the five imputed data sets were statistically similar to the actual data. Results for the analyses including imputed data were similar to the complete case analyses.

Test of the predictions from the published Framingham models

Table 2 shows the predicted Framingham risk and observed incidence of CHD and CVD events with the ratio of predicted to observed for fifths of risk and overall.

Table 2

 Predicted Framingham risk and observed incidence of CHD and CVD events by fifths of risk

Follow-up time was an average of 4.7 years (range 3.4–5.7 years). For CHD the predicted risk of an event was 5.7% and the observed incidence was 5.5% (198 cases), an overprediction of 3%. However, the Framingham score underpredicted in the low-risk fifths and overpredicted in the highest-risk fifth. For CVD, the predicted risk of an event was 10.5% and the observed incidence was 6.8% (240 cases), an overprediction of 54%. The overprediction was greatest in the two highest-risk fifths. The discrimination (area under the receiver operating characteristic curve (AUROC)) of the Framingham model was 0.59 for CHD and 0.62 for CVD events when the women were classified by fifths of risk and 0.63 and 0.64, respectively, for the ranked risk.

Comparison of the discrimination of candidate risk prediction models

Table 3 shows the hazard ratios (95% confidence intervals (CIs)) for CHD and CVD events comparing a model with the standard Framingham risk factors with one that also included log C-reactive protein and fibrinogen. In this dataset, C-reactive protein and fibrinogen predicted CHD in univariable models, but the hazard ratios were attenuated in the multivariable model. Including C-reactive protein and fibrinogen did not improve discrimination for either outcome.

Table 3

 Hazard ratios (95% CI) for CHD and CVD events estimated from models fitted on BWHHS data using Framingham risk factors (model 1) and with addition of C-reactive protein and fibrinogen (model 2)

The proposed general practice (GP) model included the standard risk factors of age, systolic blood pressure and smoking status but not cholesterol ratio, diabetes and left ventricular hypertrophy, as these require laboratory tests or an ECG. The alternative risk factors considered for inclusion in the GP model were body mass index or waist measurement and self-rated health (excellent, good, fair or poor). In contrast with the published Framingham model, a four-category classification of smoking was used in the GP model: never smoker, former smoker and current smoker of either 0–9 or 10 or more cigarettes a day. Self-rated health was a particularly strong predictor of events with a hazard ratio for “poor” compared with “excellent” of 9.6 (95% CI 4.1 to 22.9) for CHD and 11.4 (95% CI 5.1 to 25.6) for CVD. Body mass index was not an independent predictor of CHD or CVD.

Table 4 shows the hazard ratios (95% CI) for the prognostic factors in the GP models for both outcomes with the AUROCs and the p value for the difference in discrimination of the models compared with the Framingham model. Discrimination appeared to be marginally better with GP model 2, but CIs of the AUROCs for all GP models and for the corresponding Framingham model overlap.

Table 4

 Hazard ratios (95% CI) for CHD and CVD events for each of four GP models

Tables 2–4 also show the sensitivity and specificity at 30% and 15% 10-year risk thresholds for the published Framingham model and for the models estimated on this cohort, adjusted for shorter follow-up time. As the sensitivity and specificity of the published Framingham equations are different from those estimated in the model with Framingham risk factors fitted to this cohort, the Framingham risk score is not well calibrated to this population at the important treatment decision-making thresholds.


In a group of older British women we found that a simple risk assessment based on age, systolic blood pressure, smoking habit and self-rated health performed as well as the Framingham risk assessment. The addition of C-reactive protein or fibrinogen did not improve the performance of the Framingham equation.10–12 The Framingham risk assessment overpredicted, particularly for CVD, and particularly in those at higher risk. This pattern has been shown in a number of other studies of different populations in which the Framingham equation has been assessed13,14 and may result from variations in the performance of the equation in different populations or overfit of the model to the data in which it was originally developed.15 Owing to regression to the mean, one would anticipate the pattern of overprediction in high-risk and underprediction in low-risk groups when a prediction model generated in one dataset is applied to an independent dataset.

Limitations of this study

Variables for inclusion in the GP model were selected a priori. As the model is being used to predict in the same dataset that was also used to estimate the coefficients, predictive performance is likely to be overoptimistic—that is, the discrimination or the calibration of the model may be expected to worsen if it were used for prediction in independent data.16 Further work is required to produce a model that would generalise well to all British women, as well as other populations, including validation of any such model in the primary care setting.17 Prediction models, like any screening test, should be evaluated in randomised trials to determine whether they are effective.

Ideally, assessment of 10-year risk of CVD events should be based on 10 years of follow up, but only five years of follow up was available in our study. This probably did not bias the results substantially, as follow-up time was taken into account in the statistical methods. However, events might have been somewhat underascertained close to the end of follow up, which might have artificially contributed to the overprediction of events by the Framingham equations. However, for CHD it was the distribution of events over the fifths of risk that did not match, rather than the total number of events. CVD events were particularly overpredicted in our study. Results could have been affected by differences in event definition and ascertainment in Framingham compared with this cohort. Furthermore, the Framingham risk score was not designed for patients more than 74 years old, although primary prevention in this age group, and therefore our attempt to develop an easy to use and accurate prediction tool for them, is important.

Risk factors measured in epidemiological studies may not adequately reflect the methods of measurement in routine primary care, where there may be more repeated measurements but less standardised measurement. Our model depends on the validity of the self-rated health question. Research from the British Regional Heart Study has indicated that it is a reliable instrument.18 Self-rated health in the context of opportunistic screening when someone presents to their GP with symptoms in particular may be very different from our assessment in this study. However, in primary prevention screening clinics, where patients are not presenting because of ill health, we expect our model would perform in a similar way to our study.


Most work to date on the performance of prediction tools has concentrated on accuracy rather than feasibility. The focus of the modelling has not been on prognostic factors, but rather on aetiological factors. Our findings suggest that a simple model based on age, systolic blood pressure, smoking habit and self-rated health performs well for the prediction of CVD in older British women. Further research is required to examine the performance of this model in other independent datasets and different populations, but if it is found to perform well in these studies it may improve risk detection and primary prevention. The Framingham tool has not been widely used,19 possibly because it requires blood tests and ECG assessment. A tool that requires only simple tests and a few questions may be more widely used. Our GP model has the advantage over a model requiring blood tests and ECG that it would provide immediate results. Thus, primary prevention advice and treatment can begin sooner and do not require the patient to return for a second visit to discuss their results (with the risk that some patients will not return for this). On the other hand, risk assessment based on blood tests and ECG may be a more powerful motivator to comply with treatment. Ultimately, a randomised trial is required to compare the costs and effects of the different methods on primary prevention.

The Framingham equations are not well calibrated to this population of elderly British women, particularly in high-risk groups, where the decision thresholds for treatment lie, nor do they have high discrimination. If the Framingham tool continues to be used, then the threshold for treatment specified in the Department of Health guidelines should be revised downwards for older women to increase sensitivity of the risk assessment.

The nature of the prognostic variables used in prediction models will, to a certain extent, drive the focus of the intervention offered to patients found to be at high risk. For example, if cholesterol is measured and found to be high, then statins may be prescribed, but if a model uses body mass index as a predictor of CHD, then advice on lifestyle changes in diet and exercise are more likely to be given. Research on the acceptability, effectiveness and cost of different interventions in this population can inform the selection of prognostic variables for a more appropriate risk assessment tool. Furthermore, on economic grounds, using a two-stage screening process would be advantageous. A cheap screening tool such as the GP model can be used to screen out those patients with low risk. In the second stage, the more expensive and invasive blood tests can be offered to the remaining patients to determine who is at highest risk and would benefit most from drug treatments.



To reduce bias and to allow comparisons of regression models based on all subjects, we used multiple imputation of missing data with switching regression20,21 rather than complete case analysis. All prognostic variables, log of the survival times for coronary heart disease (CHD) and cardiovascular disease (CVD), and both censoring indicators were used in the imputation regressions. After imputation, five imputed data sets were analysed and the results combined appropriately by using Rubin’s rules.22 Women with prevalent CHD at baseline were excluded from all regression analyses and those with a previous diagnosis of stroke were also excluded from analyses with CVD as the end point. The point estimates were similar when the analysis was repeated on the subset of subjects with complete data.


The receiver operating characteristic (ROC) curve is a graph of sensitivity versus 1 − specificity of the classification. The area under the ROC curve (AUROC) can be interpreted as the probability that a randomly selected subject who experiences the event has a higher predicted risk than a randomly selected person who does not experience the event. Overall discrimination of the Framingham models was assessed with the AUROC. As an area of 0.5 would be expected by chance allocation, a good prediction model would have an AUROC of 0.75 or higher. For comparison, the maximum possible AUROC for these data and models, calculated with the ranked risk scores and thus independent of any choice of classification threshold, was determined.

We calculated the AUROC on the ranked risk scores predicted by the candidate models and compared them with the AUROC of the model that included only the standard Framingham risk factors. To determine how well the published Framingham equations and the candidate Weibull prognostic models could classify the women into high and low-risk groups, we calculated the sensitivity and specificity of the model predictions for both CHD and CVD events at 30% and 15% 10-year risk thresholds.


We thank all of the general practitioners and their staff who supported data collection and the women who participated in the study.



  • Published Online First 17 March 2006

  • The British Women’s Heart and Health Study is funded by the UK Department of Health and British Heart Foundation. MM is funded by the British Heart Foundation and the Medical Research Council. DAL is funded by a UK Department of Health Career Scientist Award. The funding bodies have not influenced any aspects of the study design, analysis or interpretation of results. The views expressed in this publication are those of the authors and not necessarily those of any of the funding bodies.

  • Competing interests: None declared.

  • Contributions: Margaret May developed the study aim, undertook all statistical analyses and wrote the first draft of the paper. Debbie Lawlor co-directs the British Women’s Heart and Health Study, thought of and developed the study aim and contributed to writing the paper. Peter Brindle developed the study aim and contributed to writing the paper. Rita Patel managed the British Women’s Heart and Health Study database and contributed to writing the paper. Shah Ebrahim is the principal investigator of the British Women’s Heart and Health Study, thought of and developed the study aim and contributed to writing the paper.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.