Article Text


Original article
Using electronic health records to predict costs and outcomes in stable coronary artery disease
  1. Miqdad Asaria1,
  2. Simon Walker1,
  3. Stephen Palmer1,
  4. Chris P Gale2,
  5. Anoop D Shah3,
  6. Keith R Abrams4,
  7. Michael Crowther4,
  8. Andrea Manca1,
  9. Adam Timmis5,
  10. Harry Hemingway3,
  11. Mark Sculpher1
  1. 1Centre for Health Economics, University of York, York, UK
  2. 2Faculty of Medicine and Health, Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK
  3. 3Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College, London, UK
  4. 4Department of Health Sciences, University of Leicester, Leicester, UK
  5. 5NIHR Biomedical Research Unit, Barts and the London NHS Trust, London, UK
  1. Correspondence to Miqdad Asaria, Centre for Health Economics, University of York, Alcuin A Block, Heslington, York YO10 5DD, UK; miqdad.asaria{at}


Objectives To use electronic health records (EHR) to predict lifetime costs and health outcomes of patients with stable coronary artery disease (stable-CAD) stratified by their risk of future cardiovascular events, and to evaluate the cost-effectiveness of treatments targeted at these populations.

Methods The analysis was based on 94 966 patients with stable-CAD in England between 2001 and 2010, identified in four prospectively collected, linked EHR sources. Markov modelling was used to estimate lifetime costs and quality-adjusted life years (QALYs) stratified by baseline cardiovascular risk.

Results For the lowest risk tenth of patients with stable-CAD, predicted discounted remaining lifetime healthcare costs and QALYs were £62 210 (95% CI £33 724 to £90 043) and 12.0 (95% CI 11.5 to 12.5) years, respectively. For the highest risk tenth of the population, the equivalent costs and QALYs were £35 549 (95% CI £31 679 to £39 615) and 2.9 (95% CI 2.6 to 3.1) years, respectively. A new treatment with a hazard reduction of 20% for myocardial infarction, stroke and cardiovascular disease death and no side-effects would be cost-effective if priced below £72 per year for the lowest risk patients and £646 per year for the highest risk patients.

Conclusions Existing EHRs may be used to estimate lifetime healthcare costs and outcomes of patients with stable-CAD. The stable-CAD model developed in this study lends itself to informing decisions about commissioning, pricing and reimbursement. At current prices, to be cost-effective some established as well as future stable-CAD treatments may require stratification by patient risk.

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See:

Statistics from


Cardiovascular disease (CVD) is a leading cause of mortality in England with approximately a third of all deaths attributed to it.1 The combination of an ageing population and improvements in survival after acute coronary syndrome2 has resulted in a large and growing number of patients with stable coronary artery disease (stable-CAD). CVD has, therefore, also become a major source of morbidity and healthcare resource use: there are >5 million people living with CVD in England costing the National Health Service (NHS) more than £30 billion per year.3 ,4 The stable-CAD population serves as an important example of a patient population suffering from a long-term condition. With such conditions becoming increasingly prevalent, questions regarding their prognosis have become increasingly important.5 ,6 The prognosis for patients with stable-CAD is particularly topical with new treatments,7 and new applications of existing treatments,8 currently undergoing phase III trials in this patient population.

Thus far, the majority of models to estimate the costs and health effects of CVD have focused on primary prevention,9 ,10 have made predictions only over relatively short time horizons (up to 10 years)11 so are unable to estimate lifetime costs and health effects, are based on selected samples12 potentially biasing baseline risk and cost estimates hence limiting their generalisability or fail to model all relevant endpoints and their interdependence.13 The use of linked electronic health records (EHR) can help to address many of these limitations in modelling the costs and outcomes in chronic diseases providing a source of long-term data, capturing a wide range of clinical endpoints and recording resource use in a real-world setting. As far as we are aware, there has been limited use of EHR in decision modelling.

The availability of primary care data linked with hospitalisation data, disease-specific registries and mortality data makes the English NHS an attractive setting in which to develop and demonstrate our approach for modelling the long-term costs and outcomes of chronic disease. The CALIBER (CArdiovascular disease research using Linked BEspoke studies and Electronic Health Records) data platform14 used in this study combines these key datasets and has been shown to be a valuable resource for cardiovascular epidemiology.12 ,15–17 This paper reports on the use of CALIBER to model prognosis in patients with stable-CAD, estimating their baseline risk of experiencing further CVD events and then predicting both costs and key health outcomes over the lifetime of these patients stratified by their baseline CVD risk. In doing so, the model provides a better understanding of the implications of this growing population under current standards of care as well as a framework for the evaluation of the cost-effectiveness of new treatment strategies, potentially differentiated by risk group.


Patient population

The model was based on the analysis of 94 966 patients with stable-CAD from the CALIBER collaboration. CALIBER links primary care data from the Clinical Practice Research Datalink with EHR from the Myocardial Ischaemia National Audit Project Registry, hospital inpatient records from Hospital Episode Statistics and cause-specific mortality from the Office for National Statistics. The CALIBER dataset has been described in detail by Denaxas et al.14 Patients with stable-CAD were defined as those patients in the CALIBER dataset who were event free for at least 6 months after having had unstable angina, ST elevation myocardial infarction (STEMI) or non-STEMI (NSTEMI) or those patients with stable angina or other coronary heart disease (CHD) diagnoses. The median follow-up of these patients was 4.2 (IQR 1.9–6.9) years, during which 16 783 patients died and 8203 patients experienced one or more non-fatal coronary outcomes.


The primary clinical endpoints were first occurrences of non-fatal myocardial infarction (MI), ischaemic stroke and haemorrhagic stroke, as well as CVD and non-CVD mortality. Other clinical endpoints were CVD and non-CVD mortality following a non-fatal event. These were combined to produce the primary economic outputs from the model which were quality-adjusted life years (QALYs) as well as total and CVD-specific costs, each predicted over the remaining lifetime of the patient. The model was also used to produce estimates of event rates and disease progression over time stratified by baseline CVD risk.


A state transition model (shown in figure 1) was developed to capture the natural history of patients with stable-CAD. The structure of the model was determined with reference to both previous models in CVD13 and expert clinical advice. All patients entered the model in the stable-CAD state and progressed through the model until they experienced either CVD or non-CVD mortality. The time horizon of the model was, therefore, the patient’s remaining lifetime. The model captured time varying and age-dependent risks, costs and health-related quality of life (HRQoL) in 90-day segments. Costs and HRQoL were attached to model states and, in order to stratify by patients’ baseline risk, adjusted for patient covariates at baseline as well as for age and for time elapsed following non-fatal events. Model predicted costs, life years and QALYs were discounted at 3.5% per annum in keeping with the guidelines in England.18 While only first occurrences of non-fatal CVD events were explicitly modelled, further non-fatal events were implicitly captured in the time varying risk, cost and HRQoL estimates.

Figure 1

Structure of the Markov model and the role played by the 11 risk equations that we use to model disease progression.

Statistical modelling of risk equations

Rapsomaniki et al19 developed, tested and validated a range of prognostic models for patients with stable-CAD using the CALIBER dataset. We built on their recommended prognostic model, using it as the basis for the risk equations underpinning the prediction of the five primary clinical endpoints. Using the prognostic factors and missing data imputation algorithm of Rapsomaniki et al19 we estimated various parametric survival models (generalised gamma, lognormal, Weibull, exponential) for each of the five endpoints. For each endpoint the best fitting parametric model was selected as determined by the Akaike information criteria. Predictions resulting from the selected models were assessed for plausibility by clinical experts (AT, CPG, ADS, HH). Key prognostic factors included in the models were demographic measures (age, sex, social deprivation), stable-CAD subtype (stable angina, unstable angina, STEMI, NSTEMI and other CHD), use of long-acting nitrates, whether coronary artery bypass graft or percutaneous coronary intervention (PCI) had been performed in the 6 months following CAD diagnosis, previous MI, smoking, blood pressure, diagnosis of hypertension, diabetes, lipids, CVD comorbidities (heart failure, peripheral arterial disease, atrial fibrillation, stroke), non-CVD comorbidities (chronic renal disease, chronic obstructive pulmonary disease, cancer, chronic liver disease), psychosocial factors (depression, anxiety) and clinically assessed biomarkers (heart rate, white cell count, haemoglobin, creatinine).

Risk equations for the six subsequent events, namely, CVD and non-CVD mortality following non-fatal MI, ischaemic stroke and haemorrhagic stroke, were estimated in a similar way. However, due to the greatly reduced numbers of events observed, these use only sex and age at time of non-fatal event as covariates. Non-CVD mortality beyond the maximum follow-up in the CALIBER dataset (10 years) was based on age/sex-specific non-CVD mortality from national life tables.20

These risk equations were developed into cumulative incidence functions which were then combined using a competing risks framework to account for the interdependence of the outcomes. We used methods outlined by Putter et al21 that acknowledge state transition probabilities are affected by the event being modelled and also by the other events that could occur from a given health state. Survival models were estimated using R (V.3.1.0) and the R package flexsurv (V.0.3).

Resource use and costs

Healthcare resource use was estimated directly from the CALIBER dataset. A panel was constructed using a 90-day cycle length for patients with stable-CAD in CALIBER capturing resource use in terms of hospital episodes, use of drugs, diagnostic tests and primary care consultations. Costs were attached to this resource use using the NHS reference costs,22 NHS prescription cost analysis23 and Personal Social Services Research Unit (PSSRU) unit costs for primary care24 datasets. All costs were calculated from a health systems perspective and based on the price year 2011/2012. Panel data models were used to estimate patient costs adjusted for the prognostic factors used in the model, as well as for the key CVD events in the model. This allowed us to attach costs to model states adjusted for baseline patient characteristics and event history.

Health-related quality of life

HRQoL estimates were not available from the CALIBER dataset. Instead a catalogue of EQ-5D scores for the UK25 was used to calculate age-specific, condition-specific and event-specific HRQoL. These were attached to states in the model to calculate patient-specific estimates of remaining lifetime QALYs.


Given that the model was designed to be used with a heterogeneous population, results were produced stratified by risk group. The 5 year baseline risk of experiencing at least one CVD event for each patients with stable-CAD in the CALIBER dataset was predicted based on the estimated risk equations given the patient's baseline covariate values as input parameters. The baseline values were those from the prognostic factors used in the risk equations measured at the point that the patient entered into the stable-CAD cohort. Patients were ranked by risk predictions and grouped into 10 equally sized risk groups. Model results were calculated at the mean baseline covariate value across patients within each risk group. In addition, estimates were predicted for a representative patient within each of the 10 risk groups demonstrating both the population-level and patient-level results produced by the model. The model was evaluated probabilistically by means of a Monte Carlo simulation run for 1000 iterations in order to incorporate and characterise the uncertainty in the model inputs.26

The model was used to calculate life expectancy, QALYs, total healthcare costs and CVD-specific healthcare costs for standard care, as well as for indicative new treatments assumed to reduce CVD risks by 10%, 20%, 30% and 40%. The indicative treatments were assumed to have constant costs and treatment effects, no direct effect on the risk of non-CVD mortality and no side-effects. When interpreting the results of this analysis it should be recognised that these assumptions may not hold in practice. The results were used to estimate the maximum price that could be charged for the new treatments in each of the risk groups assuming a range of cost-effectiveness thresholds between £10 000 and £40 000 per QALY. National Institute for Health and Care Excellence (NICE) employ a threshold ranging between £20 000 and £30 000 per QALY18 for considering an intervention cost-effective in England, and recent empirical evidence provides a central estimate of the threshold in England of approximately £13 000 per QALY.27

Further details about the (a) patients with stable-CAD in the CALIBER dataset, (b) the economic model, (c) the estimation of costs and transition probabilities for use in the model, (d) the risk equations used to estimate model transition probabilities, (e) patient profiles for the 10 representative patients and (f) extended tables of results can be found in the accompanying online supplementary material appendices. The full model source code detailing all calculations performed in the model, including the model input parameters for the 10 risk groups and 10 representative patients as well as detailed instructions on how to run the model, are available from:


The average baseline patient covariates by risk group are shown in table 1. For the cohort, the mean age at cohort entry was 67 years for males and 72 years for females. Stable angina (47%) was the most frequent stable-CAD subgroup and STEMI (7%) the least. One in 10 patients had received PCI within the previous 6 months, over a quarter had heart failure, nearly one in five had depression at the time of stable-CAD diagnosis and one in six had atrial fibrillation.

Table 1

Patient characteristics by risk group

There was large variation in CVD risk between the lowest and highest risk groups, with an absolute difference in 5-year risk between the lowest and highest risk group of 40.7%. The risk of clinical events positively correlated with age, higher levels of CVD risk factors (such as hypertension and diabetes) and higher prevalence of CVD comorbidities. There were no obvious trends in the key modifiable CVD risk factors such as the lipid profile.

The modelled progression of CVD over time by risk group is shown in figure 2. Higher risk groups were predicted to have much higher levels of CVD mortality compared with lower risk groups, whereas the latter were predicted to remain event free for a much longer period and were more likely to die of non-CVD-related causes.

Figure 2

Proportion of patients in each of the six model states over time as predicted by the Markov model used in this study. Each plot within the panel represents a risk decile as categorised by the baseline 5-year CVD event risk ranging from the lowest risk decile (1) to the highest risk decile (10). As can be seen in the plots the model is run until all the patients in the cohort have experienced either a fatal CVD or a fatal non-CVD event. CVD, cardiovascular disease; MI, myocardial infarction.

Summary model results by risk group are shown in table 2. The risk of all non-fatal events increased with overall CVD risk, and the risk of non-CVD mortality declined with overall CVD risk. Lower risk patients were estimated to have greater remaining life expectancy, QALYs and healthcare costs. For low risk patients (5-year CVD risk 3.5%), the remaining expected discounted lifetime healthcare costs were £62 210, and patients had 12.0 expected discounted QALYs remaining. For the highest risk group (5-year CVD risk 44.2%), the remaining expected discounted lifetime healthcare costs were £35 549, and patients had 2.8 remaining expected discounted QALYs.

Table 2

Model results by risk group

Figure 3 shows the maximum price that the health system should be willing to pay for new treatments targeted at each risk group that reduce CVD hazards by between 10% and 40%. This maximum price increased with both increasing baseline risk and with larger treatment effects in terms of proportionate risk reduction.

Figure 3

Maximum annual price for therapies as a function of baseline 5-year CVD event risk. Each plot within the panel shows the results at a given cost-effectiveness threshold ranging from £10 000 to £40 000 per QALY. The lines within the plots represent the different efficacies of our modelled treatments having hazard reductions on CVD endpoints associated with them ranging from 10% to 40%. CVD, cardiovascular disease; MI, myocardial infarction; QALYs, quality-adjusted life years.

More detailed breakdowns of these results as well as results presented for the representative patients drawn from each risk group can be found in online supplementary appendix (f).


We report the first comprehensive lifetime model of stable-CAD based on long-term EHR data. The model encompasses a full range of CVD endpoints and accounts for the interdependence of CVD risks among patients with stable-CAD. The sample sizes, duration of follow-up and the large number of endpoints and risk factors captured by the multisource EHR dataset (CALIBER) provided the opportunity to build a model which more fully and accurately captured the biological and medical nuances of such a condition. In quantifying the expected costs, life expectancy and quality-adjusted life expectancy of patients with stable-CAD, this analysis provides a means to plan budgets and services for such patients in the NHS in particular, and in health systems in developed countries more generally.

We found that at NICE's lower bound cost-effectiveness threshold (£20 000 per QALY), a treatment aimed at the lowest risk patients (5-year risk of 3.5%), would be cost-effective with annual prices up to £36, £72, £108 or £143 if the treatment was able to reduce CVD risk by 10%, 20%, 30% and 40%, respectively. For the highest risk patients (5-year risk of 44.2%), the respective maximum prices would be £325, £645, £961 or £1269. For comparison, statins commonly used by these patients reduce CVD risk by approximately a third28 and cost £16 per patient per year,29 whereas the annual cost of new antiplatelet agents can be up to £712 per patient per year.29 These estimates provide a basis for developers of new medications and health technologies for stable-CAD to define necessary effect sizes that they will need to demonstrate to be considered value for money by health systems.

In this study it has been shown that using EHR data, in combination with an analytical model such as that used by NICE in the English NHS, provides a powerful framework within which to assess the cost-effectiveness of new technologies. In the many healthcare systems with constrained budgets, cost-effectiveness analysis provides a means of comparing the additional health benefits from a new intervention with the health other patients forgo because expenditure on other types of treatments is necessarily curtailed in order to finance the new intervention (opportunity costs).30 The current analysis uses this approach as a basis for identifying the minimum treatment effect a new intervention for stable-CAD will have to achieve at a given price (or the maximum price for a given treatment effect) and cost-effectiveness threshold. These necessary treatment effects and prices will inevitably vary according to patients’ underlying risk of CVD events.

There are very few comparable studies that focus on modelling the costs and health effects over the lifetime of patients with stable-CAD. Studies that we are aware of in this area13 are typically based on short-term trial data, model only a subset of the relevant CVD endpoints and make predictions over short time horizons. Models suitable for the economic evaluation of health technologies in disease areas such as CVD where there are substantial mortality impacts need to estimate all relevant healthcare costs and health outcomes over the remaining lifetimes of patients. This is why in our study, despite having 10 years of follow-up data, we still required a model to extrapolate up to a maximum of 60 years beyond our data to estimate total lifetime costs and consequences for the full cohort of modelled patients. Limitations of our study are that HRQoL data were not recorded in the CALIBER dataset and so had to be drawn from external studies; that changes in prognostic risk factors over time were not explicitly modelled; instead the equations underpinning our model were informed by the baseline values of these risk factors; the dataset we used did not contain left ventricular ejection fraction which is an important prognostic factor in this patient population; and that the long follow-up period of our dataset may mean that the modelled risk equations may not fully reflect contemporary risk levels in the population. Additionally a number of structural assumptions had to be made for modelling purposes and these are detailed in online supplementary appendix (b).

The model we have produced allows policy makers to quantify and understand both the health and the cost burden of stable-CAD and serves as a basis for evaluating the cost-effectiveness of new treatments targeted at reducing CVD risk in this population. Our results suggest that, for the vast majority of patients with stable-CAD, it is likely that low cost interventions to improve adherence to existing secondary prevention drugs should be prioritised over high cost new treatments. It is also notable from our results that, even among the groups with the highest CVD risk, more patients are predicted to die of non-CVD-related causes than of CVD-related causes. This highlights the vital role of primary care in the holistic management of both CVD and non-CVD risk for these patients.

Key messages

What is already known on this subject?

  • Electronic health records have been shown to be useful in prognosis, but thus far their use in decision analytic models and cost-effectiveness analysis has been limited.

  • The recent improvement in acute coronary syndrome survivorship means that a growing number of people are living with cardiovascular disease.

What might this study add?

  • This study provides the first lifetime model of the costs and health effects of patients with stable coronary artery disease based on long-term linked electronic health records, predicting key cardiovascular endpoints for these patients and capturing the interdependence of these endpoints.

How might this impact on clinical practice?

  • This model can be used to evaluate and to target appropriately new treatments as they emerge for this patient population as well as to inform commissioning, pricing and reimbursement decisions.


The authors would like to acknowledge the following people for help in understanding the CALIBER dataset and prognostic models: Eleni Rapsomaniki, Pablo Perel, Spiros Denaxas, Katja Grasic, Julie George, Owen Nicholas, Ruzan Udumyan, Gene Feder, Aroon Hingorani, Spiros Denaxas, Julie George, Emily Herrett, Dipak Kalra, Aroon Hingorani, Mike Kivimaki and Liam Smeeth. This work made use of the facilities of N8 HPC provided and funded by the N8 consortium and EPSRC (Grant No. EP/K000225/1). The Centre is coordinated by the Universities of Leeds and Manchester.


View Abstract


  • Statistical package: R (V.3.1.0) and the R package flexsurv (V.0.3) were used to conduct the statistical analysis in the paper.

  • Contributors MA conducted the main data analysis and drafted and revised the paper and is the guarantor of the study. SW, SP and MS advised on health economic issues and helped to design the model. ADS, CPG, AT and HH advised on clinical issues. KRA, MC and AM advised on statistical issues. AT and HH were responsible for the overall grant from the NIHR. All authors commented on drafts of the paper. All authors, external and internal, had full access to all of the data (including statistical reports and tables) in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. MA affirms that the manuscript is an honest, accurate and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

  • Funding The study was funded by the UK National Institute for Health Research (NIHR) (RP-PG-0407-10314) and the Wellcome Trust (WT 086091/Z/08/Z) and was supported by the Farr Institute of Health Informatics Research, funded by The Medical Research Council (K006584/1) in partnership with Arthritis Research UK, the British Heart Foundation, Cancer Research UK, the Economic and Social Research Council, the Engineering and Physical Sciences Research Council, the NIHR, the National Institute for Social Care and Health Research (Welsh Assembly Government), the Chief Scientist Office (Scottish Government Health Directorates) and the Wellcome Trust. CPG is funded by the National Institute for Health Research (NIHR-CTF-2014-03-03) as associate professor and honorary consultant cardiologist. MC is partly funded by a National Institute for Health Research Doctoral Fellowship (DRF-2012-05-409). ADS is funded by a Wellcome Trust Clinical Research Training Fellowship (093830/Z/10/Z). KRA and MS are partially supported by the National Institute for Health Research as senior investigators (NF-SI-0512-10159 and NF-SI-0513-10060, respectively). The funding bodies did not play any role in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication. The views expressed in this publication are those of the authors.

  • Competing interests All authors have completed the Unified Competing Interests form at (available on request from the corresponding author) and declare that MA, SW, SP, CPG, ADS, MC and HH have nothing to disclose. AM reports and currently sits on one of the NICE Technology Appraisal Committees. MS received grant funding for the work reported in this paper from the National Institute for Health Research. Outside of the published work, he has received personal fees from various pharmaceutical and medical device companies some of which have products used in cardiovascular disease. AT reports personal fees from Menarini Pharmaceuticals, other from Servier, outside the submitted work. KRA reports personal fees from ABPI, Roche, Novo Nordisk, AstraZeneca, Janssen, Allergan, outside the submitted work.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement An extensive supplementary appendix with additional analyses has been submitted alongside the paper and the model source code and instructions on how to use it to reproduce the results in the paper are available at

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles