Objective: To determine the accuracy of assessing cardiovascular disease (CVD) risk in the primary prevention of CVD and its impact on clinical outcomes.
Design: Systematic review.
Data sources: Published studies retrieved from Medline and other databases. Reference lists of identified articles were inspected for further relevant articles.
Selection of studies: Any study that compared the predicted risk of coronary heart disease (CHD) or CVD, with observed 10-year risk based on the widely recommended Framingham methods (review A). Randomised controlled trials examining the effect on clinical outcomes of a healthcare professional assigning a cardiovascular risk score to people predominantly without CVD (review B).
Review methods: Data were extracted on the ratio of the predicted to the observed 10-year risk of CVD and CHD (review A), and on cardiovascular or coronary fatal or non-fatal events, risk factor levels, absolute cardiovascular or coronary risk, prescription of risk-reducing drugs and changes in health-related behaviour (review B).
Results: 27 studies with data from 71 727 participants on predicted and observed risk for either CHD or CVD were identified. For CHD, the predicted to observed ratios ranged from an underprediction of 0.43 (95% CI 0.27 to 0.67) in a high-risk population to an overprediction of 2.87 (95% CI 1.91 to 4.31) in a lower-risk population. In review B, four randomised controlled trials confined to people with hypertension or diabetes found no strong evidence that a cardiovascular risk assessment performed by a clinician improves health outcomes.
Conclusion: The performance of the Framingham risk scores varies considerably between populations and evidence supporting the use of cardiovascular risk scores for primary prevention is scarce.
- CHD, coronary heart disease
- CVD, cardiovascular disease
- INSIGHT, Intervention as a Goal in Hypertension Treatment
Statistics from Altmetric.com
- CHD, coronary heart disease
- CVD, cardiovascular disease
- INSIGHT, Intervention as a Goal in Hypertension Treatment
Assessing a person’s cardiovascular risk has become the accepted way of targeting preventive treatment at patients who are asymptomatic but at high risk of cardiovascular disease (CVD). Multivariate risk functions derived in several cohort studies and randomised trials form the basis of predictive functions and risk scores.1–4 Many, especially those derived from the Framingham Heart Study, have been adapted for use in primary care as simplified charts, tables, computer programs and web-based tools, and are routinely recommended in policy documents and guidelines.5–8 Depending on their absolute risk, asymptomatic people may be offered blood pressure and cholesterol-lowering treatment and aspirin, in addition to advice about relevant health behaviours. Such interventions may be life long and are associated with risks as well as benefits.
Cardiovascular risk scores, like clinical prediction rules, help clinicians prioritise treatment and should be subject to evaluation before implementation. The predictive performance of the risk score needs to be examined in different populations, and then its clinical impact must be assessed by means of a randomised controlled trial.9 For the risk-scoring approach to be a viable strategy for primary prevention, it should favourably influence people’s risk of disease or risk factors or, in the absence of such information, increase prescription of effective preventive treatments to appropriate patients.
The objectives of this study were to systematically review: (1) the external validity—that is, the extent to which predicted risk assessments accurately reflected observed risk—of widely recommended Framingham risk scores in different populations; and (2) the randomised controlled trials that have evaluated the effectiveness of risk-scoring methods for improving CVD-related outcomes.
For the two systematic reviews, a common literature base was identified and a search strategy was designed to find all studies of the external validity and clinical impact of cardiovascular risk scores. The scope of the review was to determine how well any of the relevant prediction models or scores perform in terms of observed event rates compared with predicted event rates in different settings and populations. For evaluations, only randomised trials were considered sufficiently robust to determine the unbiased and unconfounded effects of risk factor scoring on clinical outcomes.
Table 1 details the terms used to search Medline. Appropriate adaptations of search syntax were made when searching other databases. The Cochrane controlled trials register (CENTRAL), Medline, Embase, CINAHL, PsycINFO, ISI Proceedings and ZETOC were searched. Searches covered from database inception to September 2004. Reference lists of articles were searched to identify additional relevant reports and key journals were hand searched. No language restrictions were applied and articles were translated when necessary. No restrictions were applied to the years of publication. Articles were incorporated into a Reference Manager database (Thomson ResearchSoft, Carlsbad, California, USA).
Abstract screening, data extraction and inclusion criteria
Titles and abstracts were initially screened by two reviewers (ADB, PB), and potentially relevant articles were acquired and independently read by the reviewers who also extracted and checked relevant data. Authors of studies with insufficient information were contacted.
External validity—review A
When the external validity of the Framingham risk score was examined, information was extracted on the patient characteristics of the test dataset, as well as risk factors included in the risk score, the disease outcomes, prediction period and statistical methods.
We reviewed studies that evaluated the calibration by means of the risk of coronary heart disease (CHD) or CVD predicted by Framingham risk scores compared with the risk observed in the test population. A model is perfectly calibrated if the predicted risk of a person or a group of people is the same as the observed risk. The predicted and observed risks for all the studies were calculated with the number of events as the numerator and the number of participants as the denominator. For easy comparison between studies of different follow-up periods, the observed risk for each study was presented as a 10-year risk.
For the second review, any published randomised controlled trial that assessed the effectiveness of a healthcare professional using a cardiovascular risk score to aid primary prevention was considered. Control patients were required to have received usual care as provided by a general practitioner or healthcare professional with appropriate treatment and lifestyle recommendations based on current practice. The participants of studies were not subject to any age, sex or nationality exclusion criteria, but were required to be predominantly free from symptomatic CVD (less than 20% of the population studied with clinically established CVD). Patients with diabetes, raised risk factors or given preventive treatment were eligible. Studies were required to provide data on at least one of the following outcomes: cardiovascular or coronary fatal or non-fatal events, risk factor levels, absolute cardiovascular or coronary risk, the prescription of risk-reducing drugs and changes in health-related behaviour such as smoking. Information on the methodological quality of the trials including the method of randomisation, concealment of allocation, baseline group comparisons and blind outcome assessment was collected. Disagreements were resolved by discussion and, if necessary, in consultation with members of the project advisory panel.
We identified a total of 3439 articles, of which 996 were considered potentially relevant to cardiovascular risk assessment and were acquired for assessment (fig 1). We found 52 studies examining the external validity of four Framingham risk scores1,3,10,11 in 112 different population groups, of which 34 provided data on predicted and observed risk for combined fatal and non-fatal CHD or CVD outcomes. The more recent Framingham methods based on these outcomes described by Anderson et al1 and Wilson et al3 form the basis of widely recommended charts, tables and computer programs. These were subject to validation in 27 population groups and are reported here. Seven studies investigating the validity of two older risk scores not used in clinical practice were excluded,10,11 as were studies reporting only fatal outcomes.
A further 26 studies that examined the issue of effectiveness of risk-scoring methods were found, of which four were randomised controlled trials.
External validity—review A
Table 212–28 shows the characteristics of patient groups with predicted to observed ratios based on the Framingham Anderson and Wilson methods. The populations were derived from cohort studies, randomised controlled trials or health checks, or were studies of specific patient groups. Populations varied in age range and sex, date of recruitment and outcomes studied. The groups studied were representative samples of men and women, and people with diabetes, raised cholesterol, treated hypertension, no CHD determined by angiography and a family history of CVD. The start of baseline data collection in the studies ranged from 1961 to 1996. Outcomes were combined fatal and non-fatal CHD or combined fatal and non-fatal CVD.
Figure 2 shows predicted to observed ratios in populations ordered by level of observed risk of fatal and non-fatal CHD and CVD. No summary estimate is presented due to the considerable heterogeneity between the studies as indicated by the large χ2 and I2 scores. For CHD, the predicted to observed ratios ranged from an underprediction of 0.43 in a study of people with a family history of CHD12 to an overprediction of 2.87 in women from Munster.13 Underprediction was observed in studies of higher-risk patients, specifically patients with diabetes14 and a family history of premature CHD12,15 and in a higher-risk UK primary care population.16 For CVD, there was a similar trend of increasing underprediction with increasing risk of the population (the INSIGHT (Intervention as a Goal in Hypertension Treatment) trial excepted), although the range from maximum overprediction to maximum underprediction was less than that for the CHD outcome. This reflects the smaller number of studies available and the narrower range of background 10-year risk between them. The INSIGHT trial compared the effectiveness of two different hypertension treatment regimens and was an exception to this trend, probably because all the participants received blood pressure-lowering drugs and many were also taking concurrent cholesterol-lowering drugs—variables not included in the Anderson equation.17
Table 3 shows the study characteristics of the four randomised controlled trials. Three of the studies included patients with a predefined diagnosis of hypertension,29–31 and the other comprised exclusively patients with diabetes.32 Two of them used computerised clinical decision support systems,29,30 and the others informed doctors of the patient’s risk either directly31 or by recording it prominently in the medical notes.32 The risk scores used were based on the Framingham Anderson 1991 “all CVD events” equation1,29,31,32 or the Westlund Score derived in a Norwegian population.30 Outcomes related to absolute risk, treatment, referral and changes in risk factor levels.
Hall et al32 recruited 167 men and 156 women with type 2 diabetes attending a hospital outpatients’ clinic in Dundee, Scotland, and allocated 162 of them to an intervention group and 161 to a control group. The intervention group had the cardiovascular risk score documented on the front of the notes and the control group did not. The authors found that overall the intervention and control groups did not differ in change of diabetes treatment, change in hypertension drugs, change in lipid-lowering drugs or referral to a dietician. However, they noted that within a high-risk subgroup of patients (> 20% five-year risk) those in the intervention group were more likely to be prescribed blood pressure-lowering (23% v 10%) and lipid-lowering drugs (20% v 9%) than in the control group (p = 0.01 for both comparisons).
Montgomery et al29 used a cluster randomised controlled trial design with 614 patients from 27 general practices in Avon, England. Patients were randomly allocated to a computerised clinical decision support system plus cardiovascular risk chart; cardiovascular risk chart alone; or usual care. The authors found no differences between the computerised clinical decision support system plus chart group and the usual care group, but the chart-only group had significantly lower systolic blood pressure and was more likely to be prescribed cardiovascular drugs than the control group. Information on adherence to the intervention by the doctors and nurses was not supplied.
Hanon et al31 randomly assigned 1526 patients with hypertension from 953 general practitioners in France to two groups, where one group of general practitioners were told the patients’ calculated risk and the other group were not. They found no difference between the two groups in the final blood pressure, 10-year CVD risk or proportions prescribed two hypertension drugs compared with monotherapy.
Hetlevik et al30 offered a computerised clinical decision support system to 17 Norwegian health centres in the intervention group, and the general practitioners in the control group practised usual care. They found no clinically significant difference in blood pressure or total cholesterol between the two groups at the end of 21 months’ follow up. Despite the doctors having an average of 1.5 h of training on the clinical decision support system, it had been used in the treatment of only 12% of the patients in the intervention group.
As few trials were found, none that met the inclusion criteria were excluded because of their study quality. The information reported was limited, making formal comparison with set quality criteria difficult (table 3).
This systematic review has shown that the accuracy of the Framingham risk scores cannot be assumed and that it relates to the background risk of the population to which they are being applied. We have also found no strong evidence supporting the assumption that cardiovascular risk assessment performed by a clinician improves health outcomes. Screening of the population with Framingham-based risk-scoring methods continues to be recommended in current guidelines in the UK and elsewhere.6,7,33,34 The lack of evidence supporting the effectiveness of risk scores and the variable accuracy of the screening methods is of concern.
We found only four randomised controlled trials that had investigated the effectiveness of cardiovascular risk-scoring methods, in contrast to the volume of information about the accuracy of risk prediction with 52 studies examining the external validity of the Framingham risk scores. In particular, no studies included people without hypertension or diabetes—the patients who often require a cardiovascular risk assessment to determine need for drug treatments. The two studies that used computerised clinical decision support systems showed very poor uptake by the doctors in one trial30 and a negative effect when added to a risk chart in the other,29 suggesting that including clinicians in the design of decision aids may improve their use.
Strengths and limitations
This review used a sensitive search strategy with no language restrictions, and it was performed according to standard Cochrane review methods. Comparison of the studies assessing the external validity of the Framingham scores was difficult. There is no standard format for applying them or assessing their quality, and each of the studies had slightly different inclusion criteria, methods of case ascertainment and end point definitions. Broad CHD and CVD end points including fatal and non-fatal outcomes had to be used due to the variable definitions. Had it been possible, it would have been preferable to separate the harder CHD outcomes, such as non-fatal myocardial infarction and coronary death, from outcomes that include angina pectoris. We have not examined the ability of the Framingham scores to accurately identify high- and low-risk patients (discrimination) and concentrated only on calibration in different populations. We recognise that the discriminatory ability of a model is an important property; however, it is the calibration that varies most between populations and it is more amenable to adjustment.35
To our knowledge, this is the only study that has reviewed the international literature on the effectiveness of calculating a risk score. One existing review on the validity of Framingham prediction rules included only three studies with data on predicted to observed ratios.36 Our results are consistent with a study by the Diverse Populations Collaborative Group, which examined the accuracy of a single Framingham proportional hazards predictive function in 16 observational studies.37 Unlike the models used in studies in our review, their model had not been used as a risk score in clinical practice. Nevertheless, like us, they concluded that their model tended to overpredict absolute risk in populations with low observed CHD mortality and to underpredict risk in populations with high CHD mortality.
The findings of this review suggest that true cardiovascular risk in low-risk populations is likely to be overestimated, perhaps leading to unnecessary treatment of many patients. Conversely, in high-risk populations, true cardiovascular risk is likely to be underestimated, potentially resulting in these high-risk people not reaching a treatment threshold and being denied appropriate drug treatment. For example, in a deprived Scottish population Framingham predicted CVD mortality risk tended to increase with increasing socioeconomic deprivation. However, this significantly underestimated the observed gradient of increasing risk across socioeconomic groups.38 This inaccuracy between populations is relevant to other cardiovascular risk-scoring methods based on similar combinations of risk factors. Including a variable representing social deprivation may improve the performance of risk prediction models. Recalibrating the prediction models to adjust for the background risk of different geographical regions4 and ethnic groups35 is an alternative solution.
No matter how well calibrated a risk score may be, its primary purpose is to improve the management of those patients it identifies as being at high cardiovascular risk. This involves understanding how a clinician and patient interact once cardiovascular risk has been assessed. While absolute cardiovascular risk assessment remains the recommended method of targeting primary prevention, considerable work is needed to make it a practical and effective clinical tool.
We thank the members of the advisory panel for their expert advice and assistance with checking the accuracy of the data extraction. We are also grateful to the authors of studies that provided additional information and to Margaret Burke for her help with the searches.
Published Online First 18 April 2006
This study was funded by the Policy Research Programme of the UK Department of Health, project number RDD/030/064, and was carried out while PB was being supported by the Wellcome Trust. The views expressed here are those of the authors and not necessarily those of the funding agencies. The funding agencies had no role in the data collection or in the writing of this paper. The guarantor accepts full responsibility for the conduct of the study, had access to the data and controlled the decision to publish.
Competing interests: None declared.
Ethical approval: None required.
Contributors: PB and ADB identified the papers and extracted the data. PB drafted the paper and all the authors contributed to the interpretation of the data and the writing of the paper and have seen and approved the final version of the paper. PB will act as the guarantor of the paper.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.