Performance of the QRISK cardiovascular risk prediction algorithm in an independent UK sample of patients from general practice: a validation study
- 1Division of Primary Care, University Park, Nottingham, UK
- 2Centre for Health Sciences, Queen Mary’s School of Medicine and Dentistry, London, UK
- 3Avon Primary Care Research Collaborative, Bristol Primary Care Trust, Bristol, UK
- Professor J Hippisley-Cox, Division of Primary Care, 13th Floor, Tower Building, University Park, Nottingham NG2 7RD, UK;
- Accepted 1 October 2007
- Published Online First 4 October 2007
Aim: To assess the performance of the QRISK score for predicting cardiovascular disease (CVD) in an independent UK sample from general practice and compare with the Framingham score.
Design: Prospective open cohort study
Setting: UK general practices contributing to the THIN and QRESEARCH databases.
Cohort: The THIN validation cohort consisted of 1.07 million patients, aged 35–74 years registered at 288 THIN practices between 1 January 1995 and 1 April 2006. The QRESEARCH validation cohort consisted of 0.61 million patients from 160 practices (one-third of the full database) with data until 1 January 2007. Patients receiving statins, those with diabetes or CVD at baseline were excluded.
End point: First diagnosis of CVD (myocardial infarction, coronary heart disease (CHD), stroke and transient ischaemic attack) recorded on the clinical computer system during the study period.
Exposures: Age, sex, smoking status, systolic blood pressure, total/high-density lipoprotein cholesterol ratio, body mass index, family history of premature CHD, deprivation and antihypertensive medication.
Results: Characteristics of both cohorts were similar, except that THIN patients were from slightly more affluent areas and had lower recording of family history of CHD. QRISK performed better than Framingham for every discrimination and calibration statistic in both cohorts. Framingham overpredicted risk by 23% in the THIN cohort, while QRISK underpredicted risk by 12%.
Conclusion: This analysis demonstrated that QRISK is better calibrated to the UK population than Framingham and has better discrimination. The results suggest that QRISK is likely to provide more appropriate risk estimates than Framingham to help identify patients at high risk of CVD in the UK.
In July 2007, we published the results of a new cardiovascular disease (CVD) prediction algorithm known as QRISK.1 This algorithm uses existing traditional “risk factors” (age, systolic blood pressure, smoking status and ratio of total serum cholesterol to high-density lipoprotein level) for CVD but also incorporates deprivation, family history, antihypertensive treatment and body mass index. The QRISK algorithm was developed using a validated clinical research database (QRESEARCH), which consists of routinely collected data from general practitioner clinical computer systems. The original research used two-thirds of the database to derive the algorithm, with the remaining third acting as a validation sample.
The performance of QRISK was compared with two existing equations used in the UK, one based on the American Framingham cohort2 and the other on the Scottish ASSIGN cohort.3 The results demonstrated that QRISK was not only better calibrated to the UK population but also had improved discrimination compared with these existing equations. Although the validation was based on an independent sample of practices within the QRESEARCH database (http://www.qresearch.org, accessed 22 October 2007), the sample is likely to be homogeneous because all practices use the EMIS clinical computer system, which is used in 60% of UK practices. Since the intended use of the score includes non-EMIS practices, we decided to do a second validation using the THIN database (http://www.thin-uk.com, accessed 22 October 2007), based on practices recording clinical data using the INPS Vision system, which is used in 20% of UK practices. This would be a more rigorous test of the performance of QRISK compared with Framingham. This paper describes the results.
The methods for the derivation and validation of QRISK have been described in detail elsewhere.1 For this analysis, we used the original validation sample (ie, a random sample of one-third of the practices in the QRESEARCH database) with a subset of patients aged 35–74 years, excluding patients with diabetes, those with CVD and those prescribed statins at baseline. We followed up the QRESEARCH cohort from 1 January 1995 until 1 January 2007. We identified a second cohort of patients from the THIN database with the same inclusion and exclusion criteria, and registered between 1 January 1995 and 31 March 2006 (the latest data available for analysis).
Outcome, risk factors, prescribed medication and missing values
We used identical Read code definitions to determine the clinical outcome (CVD), each risk factor and prescribed drugs for each cohort. We used the clinical value recorded closest to the date on which the patient entered the study for body mass index, systolic blood pressure, smoking status, and total and high-density lipoprotein cholesterol. We generated the Framingham score2 and the QRISK score1 for each individual patient in both cohorts. To generate scores where values for risk factors were missing, we used reference values derived from the derivation cohort of QRESEARCH by 5-year bands of age and sex, assuming patients were non-smokers where smoking status was not recorded.1 We used a revised equation for QRISK (version 1.1) which takes account of improvements in the method for multiple imputation of missing data, in which additional variables (including the outcome variable) were included in the imputation model.4 This revised equation now excludes patients prescribed statins at baseline, so the results from the QRESEARCH validation database presented here will differ slightly from those previously published.1
We used the Townsend score evaluated at output area as a proxy for material deprivation. The THIN dataset differs from the QRESEARCH dataset in that each patient in the THIN dataset is allocated to a fifth of deprivation and only the category number is provided. In contrast, each patient in the QRESEARCH dataset is allocated the individual Townsend score corresponding to their output area of residence (ie, continuous data). To calculate the QRISK equation in the THIN cohort, we used the median value for each fifth defined from a national postcode table mapped to output area Townsend scores derived from the 2001 census.
Discrimination and calibration statistics
We tested the performance of QRISK in both the THIN and the QRESEARCH validation cohorts. We calculated the 10-year estimated CVD risk for each patient in the validation datasets, replacing missing values as described.
To assess calibration (ie, degree of similarity between predicted and observed risks) we calculated the mean predicted CVD risk at 10 years, and the observed CVD risk at 10 years obtained using the 10-year Kaplan–Meier estimate and compared the ratio of the predicted to the observed CVD risk for patients in the validation cohort in each decile of predicted risk. We also compared predicted and observed risks overall for men and women and calculated the Brier score.5 We calculated the area under the receiver operator curve statistic to assess discrimination (ie, ability of a risk prediction equation to distinguish between those who do and do not have a cardiovascular event during the follow-up period). We also calculated the D statistic6 and an R2 statistic derived from the D statistic,7 which are measures of discrimination and explained variation appropriate for survival models. The D statistic has been developed as a new measure of discrimination specifically for censored survival data—higher values indicate improved discrimination, and an increase in the D statistic of at least 0.1 indicates an important difference in prognostic separation between different risk classification schemes.
Comparison with Framingham
We compared the performance of QRISK against the risk estimates derived from Framingham equations.2 We used a CVD risk equation which was computed by summing the coronary risk (including myocardial infarction and coronary heart disease (CHD) death plus angina plus coronary insufficiency) and stroke risk (including transient ischaemic attack) as these outcomes are closest to those used in randomised trials of drug effectiveness. It is also the outcome used in the current Joint British Society Guidelines.8 We calculated the measures of discrimination described above for the Framingham estimates in the validation dataset and compared them with the equivalent QRISK values.
Finally, we calculated the proportion of patients in the validation samples who have a 10-year CVD risk of 20% or more by age, sex and deprivation according to QRISK and Framingham. Analyses were conducted using Stata (version 9.2).
Study population on THIN and QRESEARCH
Overall, there were 1 787 169 patients from 288 practices in the THIN cohort. Sequentially, we excluded 120 281 patients with a prior diagnosis of CVD, 2253 patients with invalid dates, 284 492 patients who were aged under 35 years, 155 248 patients aged ⩾75, 114 123 with missing Townsend scores, 28 148 patients with diabetes and 9824 patients who were taking statins. This left a cohort of 1 072 800 patients from the THIN database for analysis of whom 529 813 (49.39%) were men. There were 24 practices (54 709 patients) from Scotland and 14 practices (36 904 patients) from Northern Ireland. The corresponding cohort on QRESEARCH contained 607 733 patients and has been described previously.
Table 1 shows a comparison of the THIN and QRESEARCH validation cohorts for the validation analysis. The proportions of men and women were similar in both datasets. Patients from THIN tended to be from more affluent areas with 28% in the most affluent fifth and 12% in the most deprived. The corresponding figures for patients in the QRESEARCH cohort were 24% and 16%, showing a slightly more even distribution across the fifths. The 10-year risk of CVD was marginally higher in men in THIN than in QRESEARCH (9.87% vs 9.16%) but was more similar for women (6.55% vs 6.51%).
Table 2 shows the completeness of the recording for each risk factor for men and women in both cohorts. Data for age, sex and deprivation quintile were complete for all patients included in the analysis. Overall, levels of recording were very similar between the two cohorts with highest levels for systolic blood pressure and smoking, which were recorded in excess of 85% of patients in both cohorts. Lowest levels were observed for high-density lipoprotein cholesterol, with recorded values in 29–31% of patients. Overall, just over 25% of patients had complete data for all risk factors.
Table 3 shows baseline characteristics for both cohorts for age, sex, risk factors and medication using recorded values only. There was a striking similarity for almost all CVD risk factors between the two cohorts. For example the mean (SD) systolic blood pressure in men was 135.3 (19.6) mm Hg in QRESEARCH and 135.6 (19.4) mmHg in THIN. The corresponding figures for body mass index were 26.5 (4.0) kg/m2 in men in QRESEARCH and 26.6 (4.0) kg/m2 in THIN. The recorded prevalence of family history of premature coronary heart disease was substantially lower in THIN than QRESEARCH (3.5% in men in THIN vs 9.2% in men in QRESEARCH).
Table 4 shows the key calibration and discrimination statistics for Framingham and QRISK for both QRESEARCH and THIN cohorts. Framingham overpredicted risk by 23% in the THIN cohort while QRISK underpredicted risk by 12%. QRISK had better values than Framingham for each of the discrimination and calibration statistics in both the THIN and the QRESEARCH validation cohorts.
Table 5 shows the proportion of patients at high risk (ie, more than 20% risk of CVD over 10 years) according to QRISK and Framingham equations by fifth of deprivation in both the QRESEARCH and THIN validation cohorts. The deprivation gradient across the fifths is very similar in both cohorts for QRISK. Both cohorts show a higher proportion of patients at high risk in the most deprived fifth than in the most affluent fifth using QRISK, but a much smaller deprivation gradient for Framingham.
Table 6 shows the distribution of patients at high risk (ie, more than 20% CVD risk over 10 years) according to QRISK and Framingham equations by age band and sex. The pattern is very similar for both cohorts though overall QRISK predicted a slightly smaller proportion at high risk in THIN (7.04%) than it did in the QRESEARCH validation cohort (7.99%).
Overall, 85 009 patients in the THIN cohort (7.9% of the total) would be reclassified from high to low risk or vice versa using QRISK compared with Framingham. Of 132 076 patients classified at high risk using Framingham, then 70 764 (53.6%) would be reclassified as low risk on QRISK. In these patients, the observed 10-year risk was 17.4% (95% CI 16.8% to 17.9%). Conversely, the 14 245 patients classified as low risk on Framingham but high risk on QRISK had an observed 10-year risk of 23.7% (95% CI 22.4% to 25.0%).
Overall, 46 785 patients in the QRESEARCH cohort (7.7% of the total) would be reclassified from high to low risk or vice versa using QRISK compared with Framingham. Of 76 748 patients classified at high risk using Framingham, then 37 479 (48.8%) would be reclassified as low risk on QRISK. In these patients the observed 10-year risk was 16.7% (95% CI 16.2% to 17.2%). Conversely, the 9306 patients classified as low risk on Framingham but high risk on QRISK had an observed 10 year risk of 24.4% (95% CI 23.2% to 25.6%).
Summary of key findings
This is the first external validation of the new QRISK CVD algorithm. It has been conducted on a completely independent sample of patients registered with general practices using a different clinical computer system, but identical definitions were possible. As with the original analysis, QRISK outperforms Framingham in the THIN cohort for both calibration and discrimination. The discrimination statistics for QRISK in THIN are as good as those for the original QRISK validation cohort from QRESEARCH. INPS Vision system practices together with EMIS cover 80% of the UK general practices. The results are also important since this is the first major head-on comparison of two recently established major UK general practice databases, suggesting that the results of similar studies based on either database would be generalisable to the UK.
Baseline comparison of the two cohorts
Although the THIN cohort had more patients, the mean duration of follow-up was less than for patients in the one-third sample of QRESEARCH practices. Overall, the results show a striking similarity between THIN and QRESEARCH patients for almost all baseline characteristics. There are two notable exceptions. First, patients contributing to the THIN database tend to be from more affluent areas. Nonetheless, the incidence of CVD by deprivation shows the expected gradient with a higher risk in the most deprived areas (results available from the authors). Second, fewer patients in the THIN cohort had a recorded family history of CHD in a first-degree relative under the age of 60 years. Given the striking similarity for all the other risk factors and treatment variables, it is likely this reflects a difference in recording patterns between the two clinical computer systems rather than a true difference in prevalence.
In the derivation of the QRISK algorithm we used multiple imputation as this has been shown to introduce less bias than the complete case analysis.8a For this validation study, we generated reference values by 5-year age and sex based on derivation cohort of QRESEARCH since this is what will be needed in order to implement the algorithm into software. Again, this is less likely to introduce bias than a complete case analysis where the patients themselves are highly selected.
We applied the same method for imputing missing values (including cholesterol ratio) for both the QRISK and the Framingham calculations so that we could make a direct comparison between the two scores. If anything, the imputation would tend to reduce the discriminatory power of both scores.
Comparison of performance in the two cohorts
The discrimination statistics for QRISK in THIN are similar to those for the original QRISK validation cohort from QRESEARCH. There is a small degree of underprediction using QRISK in the THIN cohort which may be due partly to the low recording of family history in the THIN database—something which is likely to improve if the recording of family history is encouraged by inclusion in national guidelines and use of templates to standardise data entry. It may also be due to the slightly higher incidence of CVD in the THIN cohort, particularly in men.
Using QRISK instead of Framingham results in a clinically important reclassification of patients into high or low risk. For example, in the THIN analysis, more than half of the patients classified at high risk according to Framingham would be reclassified as low risk using QRISK and these patients do have a low observed 10-year risk of 17%. Similarly, patients at low risk using Framingham but high risk using QRISK had high observed 10-year risk of 24%. This shows that using Framingham would fail to identify patients at high risk of CVD but include others who are actually at low risk.
Our analysis demonstrates that QRISK performs well in an independent sample of patients derived from general practices which use a different clinical computer system in the UK. This is an important test of validity, especially since QRISK has been developed for use in the UK primary care population. The results suggest that GPs in different practices measure the same items with similar validity and that their populations are similar. A stronger test of validity would be to examine whether the score operates where risk factors have been measured under different conditions. These include other primary care settings and other parts of the world which have similar levels of cardiovascular risk. An international version of QRISK is an important area for future research and development.
FRAMINGHAM AND SCORE
Unlike the existing Framingham equations and the European SCORE equation,9 QRISK identifies and includes deprivation in the estimation of CVD risk. This will be a significant step in supporting national initiatives to reduce health inequalities in CVD10 and likely to be an improvement on Framingham, which tends to overestimate risk in affluent areas and underestimate risk in deprived areas.11 Also, a weighting for social deprivation might help minimise health inequalities, which may increase when new interventions are introduced because of the inverse equity hypothesis.12 The inclusion of family history of premature CHD is important because observational studies indicate that premature CHD in a first-degree relative increases risk by 50% or more,13 14 although our estimates were lower than this. QRISK deals with components not included in SCORE, which was designed for European use. SCORE does not include CVD morbidity, which is an integral part of both trial data (non-fatal myocardial infarction and stroke) and cost-effectiveness reviews. The statin HTA in the UK has identified a CVD event threshold of 20% in 10 years as the threshold for treatment of which most are non-fatal events.15
In a major initiative to improve public health, the National Institute of Health and Clinical Excellence (NICE) has lowered the threshold for primary prevention with statins from a 10-year CVD risk of 40% to 20%.16 17 Reliable methods for estimating risk of CVD are now needed to implement the guidelines in clinical practice in a way which does not exacerbate health inequalities. The results of this validation in an independent UK dataset suggest that QRISK is likely to provide more appropriate estimates of CVD risk in contemporary primary care UK populations and better discrimination of those at high risk based on their age, sex, social deprivation and existing antihypertensive treatment. It is likely therefore to be a more equitable and clinically appropriate tool to inform patient management decisions.
We thank David Stables (medical director of EMIS) and EMIS practices for their contribution to the QRESEARCH database. We thank EPIC and the Information Centre for extracting, preparing and checking the THIN dataset used in this analysis.
Competing interests: The authors of this paper were the authors of the original QRISK paper.
Contributorship: The study was initiated and designed by all the authors. All authors contributed to the interpretation of the results and drafting of the paper. The analysis was undertaken by JHC and checked by YV and CC.