Article Text

Original research
Independent external validation of the QRISK3 cardiovascular disease risk prediction model using UK Biobank
  1. Ruth E Parsons1,
  2. Xiaonan Liu1,
  3. Jennifer A Collister1,
  4. David A Clifton2,
  5. Benjamin J Cairns1,
  6. Lei Clifton1
  1. 1 Nuffield Department of Population Health, University of Oxford, Oxford, UK
  2. 2 Institute of Biomedical Engineering, University of Oxford, Oxford, UK
  1. Correspondence to Lei Clifton; lei.clifton{at}


Objective To externally evaluate the performance of QRISK3 for predicting 10 year risk of cardiovascular disease (CVD) in the UK Biobank cohort.

Methods We used data from the UK Biobank, a large-scale prospective cohort study of 403 370 participants aged 40–69 years recruited between 2006 and 2010 in the UK. We included participants with no previous history of CVD or statin treatment and defined the outcome to be the first occurrence of coronary heart disease, ischaemic stroke or transient ischaemic attack, derived from linked hospital inpatient records and death registrations.

Results Our study population included 233 233 women and 170 137 men, with 9295 and 13 028 incident CVD events, respectively. Overall, QRISK3 had moderate discrimination for UK Biobank participants (Harrell’s C-statistic 0.722 in women and 0.697 in men) and discrimination declined by age (<0.62 in all participants aged 65 years or older). QRISK3 systematically overpredicted CVD risk in UK Biobank, particularly in older participants, by as much as 20%.

Conclusions QRISK3 had moderate overall discrimination in UK Biobank, which was best in younger participants. The observed CVD risk for UK Biobank participants was lower than that predicted by QRISK3, particularly for older participants. It may be necessary to recalibrate QRISK3 or use an alternate model in studies that require accurate CVD risk prediction in UK Biobank.

  • Risk Factors
  • Epidemiology

Data availability statement

Data may be obtained from a third party and are not publicly available. This research has been conducted using the UK Biobank Resource under Application Number 33952. Requests to access the data should be made via application directly to the UK Biobank,

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • The QRISK3 model is routinely used in the UK by healthcare providers to calculate the 10 year cardiovascular disease risk of patients.

  • Although QRISK3 was derived and validated using primary care data, it is used without validation in UK Biobank by researchers.


  • QRISK3 has moderate discrimination for UK Biobank participants but this declines with age.

  • QRISK3 systematically overpredicts cardiovascular disease risk in UK Biobank, particularly in older participants, by as much as 20%.


  • Researchers that require the accurate prediction of cardiovascular events in UK Biobank may want to use strategies (eg, recalibration) to improve QRISK3's prediction.

  • Where the use of QRISK3 is not essential, researchers may benefit from developing tailored risk prediction models for their study aims using other risk factors available for UK Biobank participants.


Cardiovascular diseases (CVDs) are the leading cause of global mortality,1 and healthcare providers need to be able to identify patients with a high risk of CVD to allocate primary prevention measures accurately and reliably. Prognostic models can classify individuals into event risk groups, allowing decisions to be made about their healthcare. There are numerous prediction models designed to estimate the risk of developing CVD in use worldwide, including the Framingham,2 SCORE3 and QRISK4–6 models. The QRISK models are Cox proportional hazard models that predict time to CVD events and are routinely used in the UK by healthcare providers to calculate the 10 year risk of CVD for patients during routine NHS health checks.7

The first QRISK model was published in 2007 with the aim of estimating the 10 year risk of CVD in females and males.4 QRISK was followed by an updated model (QRISK2) in 2008, which included additional risk factors compared with its predecessor.6 Since 2008, QRISK2 has been updated continually, with the inclusion of type 1 diabetes as a risk factor, expansion of categories for the smoking variable, and updated Townsend deprivation score with new census data.5 6 The most up to date QRISK model, QRISK3, was derived in 2017 using health records of 1 283 174 patients aged between 35 and 74 years registered between 1995 and 2007 with practices across the UK,6 to incorporate risk factors that were outlined in the National Institute for Health and Care Excellence (NICE) 2014 clinical guideline.8 The risk factors included in the QRISK3 model can be seen in box 1.

Box 1

Risk factors included in the QRISK3 model6

  1. Age at study entry (baseline)

  2. Ethnic origin (nine categories)

  3. Deprivation (as measured by the Townsend score, where higher values indicate higher levels of material deprivation)

  4. Systolic blood pressure

  5. Body mass index

  6. Total cholesterol: high-density lipoprotein cholesterol ratio

  7. Smoking status (non-smoker, former smoker, light smoker (1–9/day), moderate smoker (10–19/day) or heavy smoker (≥20 /day))

  8. Family history of coronary heart disease in a first-degree relative aged less than 60 years

  9. Diabetes (type 1, type 2 or no diabetes)

  10. Treated hypertension (diagnosis of hypertension and treatment with at least one antihypertensive drug)

  11. Rheumatoid arthritis (diagnosis of rheumatoid arthritis, Felty’s syndrome, Caplan’s syndrome, adult-onset Still’s disease or inflammatory polyarthropathy not otherwise specified)

  12. Atrial fibrillation (including atrial fibrillation, atrial flutter and paroxysmal atrial fibrillation)

  13. Chronic kidney disease (general practitioner recorded diagnosis of chronic kidney disease stage 3, stage 4 or 5) and major chronic renal disease (including nephrotic syndrome, chronic glomerulonephritis, chronic pyelonephritis, renal dialysis and renal transplant)

  14. Measure of systolic blood pressure variability (standard deviation (SD) of repeated measures)

  15. Diagnosis of migraine (including classic migraine, atypical migraine, abdominal migraine, cluster headaches, basilar migraine, hemiplegic migraine, and migraine with or without aura)

  16. Corticosteroid use (British National Formulary (BNF) chapter 6.3.2 including oral or parenteral prednisolone, betamethasone, cortisone, depo-medrone, dexamethasone, deflazacort, efcortesol, hydrocortisone, methylprednisolone or triamcinolone)

  17. Systemic lupus erythematosus (SLE) (including diagnosis of SLE, disseminated lupus erythematosus or Libman-Sacks disease)

  18. Second generation ‘atypical’ antipsychotic use (including amisulpride, aripiprazole, clozapine, lurasidone, olanzapine, paliperidone, netiquette, risperidone, sertindole or zotepine)

  19. Diagnosis of severe mental illness (including psychosis, schizophrenia or bipolar affective disease)

  20. Diagnosis of erectile dysfunction or treatment for erectile dysfunction (BNF chapter 7.4.5 including alprostadil, phosphodiesterase type 5 inhibitors, papaverine or phentolamine)

QRISK3 was externally validated using primary care data from the Clinical Practice Research Datalink (CPRD) in 2021 and was found to perform well at the overall population level.9 CPRD has a similar case-mix to the QRISK3 derivation cohort, and therefore the external validation of QRISK3 using CPRD measures the reproducibility of the model’s performance. The transportability of QRISK3 (ie, how well it performs in a population with different characteristics to the derivation cohort) must be explored in each independent population that differs considerably in setting to the derivation cohort10 before the model can be used reliably in that independent population.

We have identified multiple studies where the QRISK3 scores for UK Biobank participants are used in the analyses. The extent to which the authors of the identified studies address the discrimination and the calibration of QRISK3 applied to UK Biobank data varies, and the way that this lack of validation may affect the conclusion of these publications depends on the study aims. Without considering the discrimination and calibration of the QRISK3 model applied to UK Biobank data, the accuracy of resulting CVD risk scores is unknown, and any subsequent conclusions that are drawn may be misleading. This study is the first to investigate the calibration of QRISK3 in UK Biobank, and provides an independent external validation of the QRISK3 model applied to this widely used cohort.


Study population

The study design and methods of the UK Biobank study have been described previously.11 12 Approximately 9.2 million people were invited to take part in the UK Biobank study between 2006 and 2010; these were individuals aged between 40 and 69 years, who were registered with the NHS and who lived in a 25 mile radius of 1 of 22 assessment centres in England, Wales and Scotland.11 A total of 503 325 individuals attended an assessment centre, a response rate of 5.5%.11 12 At baseline, all participants completed a touch-screen questionnaire, a verbal interview and had physical measurements and samples of blood, urine and saliva collected.11 12

To align with the exclusion criteria used in the derivation of QRISK3,6 we excluded UK Biobank participants who had a prior diagnosis of CVD, were using prescribed cholesterol-lowering medication at cohort entry or had missing Townsend deprivation scores. The participants in UK Biobank (40–69 years at baseline) were all within the age range required for the QRISK3 model.

We did not involve participants in the design, or conduct, or reporting, or dissemination plans of our research.

Definition of risk factors

All UK Biobank variables we have used in the QRISK model were measured at the baseline recruitment visit, which took place in 2006–2010. A similar approach was used in the original QRISK derivation, with ‘the value closest to the entry date to the cohort for each patient’ used for each risk factor.4

We matched the risk factors included in the QRISK3 model to the variables available in UK Biobank, including variables collected during the baseline assessment visit at study entry, and prevalent disease diagnoses determined through linked hospital records (further detail in online supplemental materials). For sociodemographic risk factors, data from the baseline assessment visit was used, and for diagnosis of diseases, linked healthcare records were used to identify prevalent diagnoses at baseline. When a risk factor required for the QRISK3 model could not be perfectly matched to a UK Biobank field, the closest matching field was used.

Supplemental material


The outcome of interest is incident CVD defined in the QRISK3 derivation by a composite outcome of coronary heart disease, ischaemic stroke or transient ischaemic attack.6 We derived this outcome using International Classification of Diseases and the Office of Population Censuses and Surveys Classification of Interventions and Procedures version 4 codes from hospital episode statistics and death registration data (further detail in online supplemental materials). The follow-up time for each participant was calculated as being the number of years from date of baseline assessment until the earliest date of the following: CVD event date, death date by other causes, date of loss-to-follow-up or UK Biobank administrative censoring date (England: 2020-11-30; Scotland: 2020-10-31; Wales: 2018-02-28).

Statistical analysis

As in the derivation of QRISK3, participants with missing Townsend deprivation scores were excluded and those with missing data on ethnicity were assumed to be white.6 We used multiple imputation (MI) with chained equations to impute missing data on total cholesterol/high-density lipoprotein cholesterol ratio, smoking status, weight, height, systolic blood pressure (SBP) and SBP variability, by gender. To assess the assumption of ‘missing at random’, we compared baseline characteristics before and after MI (online supplemental table 6). Further details on MI can be found in the online supplemental materials.

Model performance of QRISK3 was assessed by both discrimination and calibration. Discrimination measures the ability to distinguish between low-risk and high-risk patients13; patients with higher risk predictions should have higher event rates than those with lower risk predictions.14 15 We assessed discrimination overall and in each age group as used in the QRISK3 derivation (35–44, 45–54, 55–64, 65–74 years) at 10 years. We used Harrell’s C-statistic, a measure which quantifies the correlation between ranked predicted and observed survival16 such that a Harrell’s C-statistic of 0.5 would indicate that the risk prediction from the model is no better than chance in predicting patient outcomes and 1 would indicate that the model is approaching perfect separation of patient outcomes. Additionally, we used Royston and Sauerbrei’s D-index17 which measures the amount of variation in risk between individuals with low and high predicted risks. Royston’s D-index can be interpreted as the log hazard ratio between the low-risk and high-risk groups, with higher values showing greater discrimination; and an increase of 0.1 or more over other scores is an indicator of improved outcome discrimination.6 18

Calibration assesses how well the predicted risk corresponds to the observed risk on a group level.14 We assessed calibration graphically by comparing the mean predicted risk with the mean observed risk at 10 years, by deciles of the QRISK3 predicted risk distribution. The observed risks were obtained using cumulative incidence Kaplan-Meier estimates at 10 years. Calibration was evaluated overall and in each age group.

All analyses were conducted in R (V.4.1.1).


Data from 502 488 participants in UK Biobank were reviewed for eligibility in this study (figure 1). In line with the QRISK3 derivation exclusion criteria, we excluded 623 participants with missing Townsend deprivation scores, a further 90 296 using statins at baseline and finally 8199 with previous diagnosis of CVD. The median follow-up time was 11.7 years and 92.4% participants had 10 years or more of follow-up. Our analysis population consisted of 233 233 female and 170 137 male participants from UK Biobank. The minimum age of the participants enrolled in UK Biobank was 39.7 years for women and 37.4 for men and the maximum 71.0 and 73.7, respectively. Table 1 shows the baseline characteristics for the included participants and table 2 shows the quantity of missing data by age and sex. Overall, we replaced all missing data with MI and pooled the statistical estimates from 10 imputed datasets using Rubin’s rules (further detail in online supplemental materials).

Figure 1

Flow chart of the female and male population used for this study after exclusions for missing Townsend deprivation scores, statin prescription and previous diagnoses of CVD. CVD, cardiovascular disease.

Table 1

Baseline characteristics of all participants in the UK Biobank cohort and the published QRISK3 derivation cohort6 by sex

Table 2

Quantity of missing data of all participants in the UK Biobank at baseline by sex and age

The maximum follow-up time in our study population (after applying exclusion criteria) was 13.95 years, less than the maximum follow-up time of 15 years in the derivation cohort of QRISK3.


Discrimination of QRISK3 was moderate for both female (Harrell’s C-statistic 0.72, D-index 1.28) and male UK Biobank participants (Harrell’s C-statistics 0.70, D-index 1.11) (table 3).

Table 3

Measures of discrimination performance for QRISK3 applied to UK Biobank participants, with 95% CIs

The ability of QRISK3 to discriminate was attenuated as age increased (table 4), with the Harrell’s C-statistic and D-index, respectively, decreasing from 0.72 and 1.39 in the youngest female participants to 0.62 and 0.67 in the oldest female participants. These measures decreased from 0.73 and 1.34 in the youngest male participants to 0.60 and 0.54 in the oldest male participants. Discrimination was better in female participants than male participants overall and by age.

Table 4

Measures of discrimination performance for QRISK3 in the UK Biobank cohort in each age group, with 95% CIs


Figure 2 shows the agreement between the 10 year observed and QRISK3 predicted CVD risk, grouped by the decile of the participant's respective QRISK3 score, such that the 10% of participants with the lowest QRISK3 score were binned into the first decile and so forth. The predicted probability within each decile group is plotted as blue squares in figure 2 and was calculated as the average QRISK3 score within that group. The observed 10 year CVD probability within each group is plotted as orange circles in figure 2 and was calculated using the Kaplan-Meier method (to account for right censoring). The plot suggests that QRISK3 systematically overpredicts CVD risk for UK Biobank participants, with the magnitude of overprediction increasing at higher risk deciles.

Figure 2

Calibration of QRISK3 at 10 years for female and male participants of UK Biobank overall. The cumulative Kaplan-Meier observed CVD probability in each 10th of risk is denoted by the orange circular markers and the mean predicted QRISK3 score in each 10th of risk is denoted by the blue square markers. CVD, cardiovascular disease.

Figure 3 shows the agreement between the 10 year observed and QRISK3 predicted CVD risk, grouped by the decile of the participant's respective QRISK3 score and presented by age group and can be interpreted in the same way as figure 2. Figure 3 suggests that QRISK3's overprediction of CVD risk for UK Biobank participants may be driven by the increasing magnitude of overprediction for older participants.

Figure 3

Calibration of QRISK3 at 10 years for female and male participants of UK Biobank in each age group. The cumulative Kaplan-Meier observed CVD probability in each 10th of risk is denoted by the orange circular markers and the mean predicted QRISK3 score in each 10th of risk is denoted by the blue square markers.


Principal findings

In this external validation, QRISK3 had moderate ability to discriminate CVD risk for women and men in UK Biobank. The best discriminative accuracy of QRISK3 was seen in the youngest age group for both men and women and discriminative accuracy diminished with age. QRISK3 had poor calibration in UK Biobank, with overprediction of CVD events for both sexes and in all age groups at 10 years of follow-up.

Interpretation of findings

The poor calibration of QRISK3 in UK Biobank contrasts with the findings from the QResearch internal validation cohort6 and CPRD external validation cohort9 that showed good overall calibration. This discrepancy may be due to differences in population characteristics, temporal sequence of risk factors and event occurrence (online supplemental figure 1), and the prevalence of risk factors between populations. The QResearch and CPRD cohorts were derived from primary care databases, with a different case mix to UK Biobank which is known to be less representative of the general UK population, with participants tending to be older, female, live in less socioeconomically deprived areas, be less obese, smoke less, drink less alcohol and have fewer health conditions on average in addition to the evidence for healthy volunteer bias in UK Biobank.19 Additionally, though the ICD-9 and 10 codes used were the same between QRISK3 and UK Biobank, there are differences in sources used to derive CVD outcomes between the derivation cohort (see online supplemental materials of the study by Hippisley-Cox et al 6) and this study; specifically, we incorporated CVD-related operative procedures from hospital inpatient data and we only used hospital records and death registry to derive the CVD outcome not incorporating CVD diagnosis by GP.

The interactions between age and other predictor variables were already assessed and significant interactions were included in the QRISK3 model,4 5 but we still observed increased overprediction of CVD by age when applying the QRISK3 model to the UK Biobank cohort. The greatest overprediction of CVD risk by QRISK3 in UK Biobank was observed in the oldest age group in both sexes. Although it was not the case in the QResearch internal validation study,6 poor calibration for older participants was also found in the CPRD external validation cohort.9 This may reflect the UK Biobank cohort being considerably older than both the QResearch and CPRD cohorts and that the traditional risk factors in QRISK3 may not account for the unobserved heterogeneity in CVD risk resulting from individuals with the same values of risk factors having different instantaneous CVD risk distribution.20 This heterogeneity may be due to the complex interaction between ageing and cardiovascular physiology, the higher prevalence of undiagnosed or untreated comorbidities in older people and/or the changes in the magnitude of risk factors over the life course.

The discrimination of QRISK3 in UK Biobank is consistent with studies that have applied the model to this population previously,21–24 where the overall discrimination is better in younger compared with older participants in UK Biobank. This is to be expected as older participants have generally higher CVD risk, hence a relatively reduced CVD risk range compared with their younger counterparts. Discrimination is a measure of how well the risk prediction equation ranks participants and since age is an important risk factor for CVD, most CVD risk equations are expected to have reduced ability to rank older participants compared with younger ones within the same cohort.

Where results are important for accurate risk prediction and decision-making, poor calibration is arguably more important than discrimination. The consequences of poorly calibrated risk prediction models have been explored more in clinical settings than in epidemiological studies, which may lead to spurious findings in the latter. A model with poor calibration is unlikely to provide true risks when comparing risk estimates between groups in a cohort, and researchers need to be aware of this when reporting their results. The degree to which the performance abilities of a model (when applied to an independent external dataset) should be considered depends on the research objective of the specific study. For studies that require accurate prediction of cardiovascular events in cohorts that are independent from the derivation cohort, there are strategies for minimising the model’s inaccuracy and improving model performance depending on the study aims. These strategies include model recalibration, updating the independent variables, considering using an alternative model and collecting the most appropriate risk factors in cohort studies, as discussed further by Parsons, et al.25

Researchers working on CVD risk prediction have been moving towards including more personalised factors and using machine learning models, in conjunction with traditional risk factors and statistical models. Such advancements are typically observed in epidemiological cohorts where deep phenotypic and -omics data are available, for example, in UK Biobank. However, these complex models may not be practical when available data for CVD risk prediction are limited to routinely collected data, which is likely the case for healthcare providers in the UK. Future research can focus on improving the consistency and accuracy of CVD risk prediction in both epidemiological studies and routine healthcare.

Strengths and limitations

The major strengths of this study are the large sample size and low loss to follow-up (<0.01% participants). This study followed the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement,26 covering all 22 checklist items that are essential for good reporting of studies that validate multivariable prediction models.

The coding of some variables in UK Biobank data fields does not exactly match the risk factors and outcomes required in QRISK3 which may limit its predictive abilities in UK Biobank. However, our study provides information in assessing QRISK3 in a population that is different from a primary care cohort, as QRISK3 is often applied outside of clinical settings.


QRISK3 overpredicts CVD risk for participants of UK Biobank, with the magnitude of this overprediction increasing by age. QRISK3 has moderate overall discrimination for UK Biobank participants, however, the discriminative accuracy of the model declines for older participants. Noting the differences in case-mix between UK Biobank and primary care data, researchers using UK Biobank data that require a CVD risk prediction model that is well calibrated or has good discriminatory prediction for older participants may want to consider recalibrating QRISK3 or using an alternative model.

Data availability statement

Data may be obtained from a third party and are not publicly available. This research has been conducted using the UK Biobank Resource under Application Number 33952. Requests to access the data should be made via application directly to the UK Biobank,

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and was approved by the National Health Service’s National Research Ethics Service North West (11/NW/0382). Participants gave informed consent to participate in the study before taking part.


We thank the participants of UK Biobank and the study team for enabling us to conduct this research.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors Conceptualisation—RP, LC, BC; data curation—RP, XL, JC; formal analysis—RP, XL, JC; methodology—RP, LC, XL, JC; supervision—LC, BC, DC; original draft and writing—RP; review and editing—RP, XL, JC, LC, BC, DC; guarantor—RP.

  • Funding The UK Biobank study was supported by the Wellcome Trust, Medical Research Council, Department of Health, Scottish government, and Northwest Regional Development Agency. It has also received funding from the Welsh Assembly government and British Heart Foundation. DC declares academic grants from GlaxoSmithKline and personal fees from Oxford University Innovation, Biobeats and Sensyne Health, outside the context of this work. DC is funded by an RAEng Research Chair and an NIHR Research Professorship, in addition to support from the NIHR Oxford Biomedical Research Centre, the InnoHK Centre for Cerebro-cardiovascular Engineering, the Oxford Pandemic Sciences Institute and the Oxford-Suzhou Centre for Advanced Research (China).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.