Objective The original EuroSCORE models are poorly calibrated for predicting mortality in contemporary cardiac surgery. EuroSCORE II has been proposed as a new risk model. The objective of this study was to assess the performance of EuroSCORE II in UK cardiac surgery.
Design A cross-sectional analysis of prospectively collected multi-centre clinical audit data, from the Society for Cardiothoracic Surgery in Great Britain and Ireland Database.
Setting All NHS hospitals, and some UK private hospitals performing adult cardiac surgery.
Patients 23 740 procedures at 41 hospitals between July 2010 and March 2011.
Main outcome measures The main outcome measure was in-hospital mortality. Model calibration (Hosmer–Lemeshow test, calibration plot) and discrimination (area under receiver operating characteristic curve) were assessed in the overall cohort and clinically defined sub-groups.
Results The mean age at procedure was 67.1 years (SD 11.8) and 27.7% were women. The overall mortality was 3.1% with a EuroSCORE II predicted mortality of 3.4%. Calibration was good overall but the model failed the Hosmer–Lemeshow test (p=0.003) mainly due to over-prediction in the highest and lowest-risk patients. Calibration was poor for isolated coronary artery bypass graft surgery (Hosmer–Lemeshow, p<0.001). The model had good discrimination overall (area under receiver operating characteristic curve 0.808, 95% CI 0.793 to 0.824) and in all clinical sub-groups analysed.
Conclusions EuroSCORE II performs well overall in the UK and is an acceptable contemporary generic cardiac surgery risk model. However, the model is poorly calibrated for isolated coronary artery bypass graft surgery and in both the highest and lowest risk patients. Regular revalidation of EuroSCORE II will be needed to identify calibration drift or clinical inconsistencies, which commonly emerge in clinical prediction models.
- Risk assessment
- EuroSCORE II
- cardiac surgery
- coronary artery bypass grafting
- aortic valve replacement
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- Risk assessment
- EuroSCORE II
- cardiac surgery
- coronary artery bypass grafting
- aortic valve replacement
Risk prediction models have played an important role in cardiac surgery for over 20 years. They are used to provide information on risk to clinicians and patients, to guide clinical decision-making and benchmark clinical services. One of the first cardiac surgery risk prediction models to be published was the Parsonnet score, which was developed on data from 3500 cardiac surgery procedures performed in the USA in the 1980s.1 Although there were concerns that some key risk factors such as respiratory disease were missing from the Parsonnet score it was widely accepted. However, over time and with improved surgical outcomes, the performance of the model was shown to be poor2 ,3 and more contemporary models were developed.
The EuroSCORE model was developed using data from 14 781 patients who had undergone cardiac surgery in eight European countries during 1995 and was initially published in 1999 as an additive model.4 This model was assessed in a number of different countries both inside and outside Europe and found to be valid.5 ,6 However, the accuracy of the additive model, particularly in high-risk patients, was questioned and the logistic version of the EuroSCORE was published in 2003.7 Although the logistic model required a computer program to calculate, it had the advantage of improved calibration, particularly in high-risk patients.8
The logistic EuroSCORE has been the risk prediction model of choice for European cardiac surgery for the past 9 years; however, it now over-predicts the risk of in-hospital mortality for contemporary cardiac surgery.9–11There are also potential limitations with how the EuroSCORE assigns risk among patients with renal disease12 and for those undergoing multiple cardiac procedures. In addition, concerns have been raised about the performance of the model in high-risk patients.13 To address these issues EuroSCORE II has been developed using data from 22 381 patients who underwent cardiac surgery during 2010.14 Before EuroSCORE II can be accepted as an appropriate risk model it requires validation in the population in which it is intended to be used. The objective of this study was to assess the performance of the EuroSCORE II in the UK adult cardiac surgery population.
Materials and methods
Data collection, validation and cleaning
Prospectively collected data for all cardiac surgery procedures performed in the UK between 26 July 2010 and 31 March 2011 were extracted from the Society for Cardiothoracic Surgery in Great Britain and Ireland (SCTS) Database. An algorithm was developed to clean the data by removing duplicate records, correcting transcriptional discrepancies and resolving any clinical or temporal conflicts. At this stage, and prior to any analysis, data were returned to each contributing hospital for local validation.
The primary outcome for the study was in-hospital mortality which was defined as death due to any cause at the base hospital during admission for cardiac surgery. The in-hospital mortality data for each procedure was validated against Office for National Statistics records—the statutory deaths register of the UK. The following records were excluded from the analysis: transplants, trauma and ventricular assist device procedures; patients >95 years of age; records with in-hospital mortality status missing and no Office for National Statistics linkage data available; any procedures which were not a single or the first cardiac procedure in any admission spell. Definitions of the risk factors used for the study are available from http://www.ucl.ac.uk/nicor/audits/Adultcardiacsurgery/datasets
Risk factor imputation
If any of the categorical/binary risk factor data (with the exception of creatinine clearance) required for calculation of the EuroSCORE II model were missing for a record it was imputed using the standard SCTS assumption that missing risk factor data is equivalent to absence of the risk factor. The percentage of missing data for each original EuroSCORE risk factor was <5%. As the SCTS Database was not specifically designed to collect EuroSCORE II risk factors, it was necessary for different methods of imputation to be applied for some risk factors and for some to be inferred from other database fields. These imputations were checked using sensitivity analyses and are detailed below.
Measured serum creatinine values were not collected for the majority of records however indicators of renal function such as serum creatinine >200 μmol/l or pre-operative dialysis were available for over 95% of records. The overall validation analysis uses these indicators to impute pseudo-values for serum creatinine which are subsequently mapped to the EuroSCORE II creatinine clearance categories. To assess this assumption, actual missing serum creatinine values for patients not identified as being on dialysis or having a serum creatinine measurement >200 μmol/l were imputed with random values from a lognormal distribution with mean and SD (on log-scale) calculated from the available measured serum creatinine values. This imputation method was repeated 20 times with model performance assessed each time and then averaged with 95% CIs calculated using Rubin's rule.15 A separate analysis of model performance in the subset of patients with measured serum creatinine was also performed.
Left ventricular ejection fraction categories good and moderate were mapped consistently from original EuroSCORE categories to the EuroSCORE II categories. EuroSCORE II splits the original EuroSCORE poor left ventricular ejection fraction category into two categories; poor (21–30%) and very poor (≤20%). Therefore unless an actual continuous ejection fraction value was recorded all patients categorised as original EuroSCORE poor left ventricular ejection fraction were categorised as having EuroSCORE II poor ejection fraction (21–30%). This imputation was checked against the categorisation for all records where an actual ejection fraction recorded for consistency. No data for poor mobility due to musculoskeletal dysfunction were available from the original EuroSCORE dataset meaning that only patients with poor mobility due to neurological dysfunction were identified as having the EuroSCORE II risk factor poor mobility.
Following risk factor imputation, the EuroSCORE II was calculated for each record. Model performance was assessed using measures of calibration and discrimination in the overall cohort and important clinical sub-groups. Model calibration was evaluated using both calibration plot methodology and the Hosmer–Lemeshow test.16 The calibration plot presented shows the mean predicted probability of outcome against the observed proportion of outcomes for 10 equally sized groups based on the ranked predicted risks calculated by the models. The plot is overlaid with the optimal calibration line. A non-parametric locally weighted scatterplot smoothing curve showing the general trend is also displayed. It should be noted that no inferences can be made regarding the locally weighted scatterplot smoothing curve on the calibration plot and that this curve is merely included to illustrate the predictive trend with respect to the observed mortality. Approximate 95% CIs for the observed mortality proportions are shown as error bars, estimated using the Agresti-Coull approximation.
Model discrimination was evaluated by determining the receiver operating characteristic curve that is summarised by the area under the curve (AUC).17 De Long's method for calculating AUC variance was used for the calculation of AUC 95% CIs.18 All p values <0.05 were considered significant. All statistical analyses were performed using R software (R Development Core Team 2011).19
For the study period 24 008 records were available. There were a total of 268 records that met the exclusion criteria leaving a final cohort for analysis of 23 740 records. The mean age of the population was 67.1 years (SD 11.8) and 27.7% of patients were female. Other patient characteristics are shown in table 1. The majority of patients (12 470, 52.5%) underwent isolated coronary artery bypass graft (CABG) surgery. There were 7316 (30.8%) procedures performed on the aortic valve and 2944 (12.4%) procedures performed on the mitral valve. In total there were 7584 (31.9%) non-elective procedures. Full operative details for the study population are shown in table 2.
Overall model performance
The overall mortality was 3.1% and the EuroSCORE II predicted mortality was 3.4% giving an observed to expected (O:E) ratio of 0.92. Visual inspection of the calibration plot (figure 1) demonstrates good calibration of the EuroSCORE II model for the majority of deciles. However there is an obvious over prediction of mortality for both the first and last decile as shown in both figure 1 and table 3. As a result, the model failed the Hosmer–Lemeshow test (χ2=27.07, p=0.003). EuroSCORE II showed good discrimination overall with an AUC of 0.808 (95% CI 0.793 to 0.824).
Model performance by weight of intervention
Calibration of the model was poor for isolated CABG with a significant over prediction of mortality (O:E ratio of 0.71). The O:E ratios for the other weight of intervention sub-groups were close to 1.0 (table 4). All sub-groups failed the Hosmer–Lemeshow test at the 5% level with the exception of the single non-CABG procedure group and 2-procedures group (table 4). Only the isolated CABG group failed the Hosmer–Lemeshow test at the 1% significance level. EuroSCORE II demonstrated good discrimination for all sub-groups based on weights of interventions (table 4).
Model performance for aortic valve replacement
There were a total of 3116 isolated aortic valve replacements (AVR) and 2401 AVR + CABG procedures. The observed mortality for isolated AVR was 2.1% compared to a predicted mortality rate of 2.6% (O:E ratio 0.81). The observed mortality for AVR plus CABG was 4.4% compared to a predicted mortality rate of 4.8% (O:E ratio 0.92). EuroSCORE II demonstrated good fit based on the Hosmer–Lemeshow test for both isolated AVR procedures (p=0.17) and AVR + CABG procedures (p=0.25) respectively. EuroSCORE II also demonstrated good discrimination for both groups with AUCs of 0.772 (95% CI 0.712 to 0.833) and 0.720 (95% CI 0.670 to 0.770) for isolated AVR and AVR + CABG respectively.
Sensitivity analysis for renal function
The primary sensitivity analysis for the creatinine assumption was to replace the pseudo-imputed creatinine values with a random value drawn from an appropriate distribution; a form of multiple imputation. The average AUC of 0.806 (95% CI 0.790 to 0.822) was similar to that for the overall validation. The O:E ratio decreased by 0.06 to 0.86 (95% CI 0.80 to 0.92). Inspection of the multiplicative coefficients which transform serum creatinine into creatinine clearance rates (using the Cockcroft-Gault approximation) justified the decision not to subject these values to multiple imputation as it would not have had a substantial effect on the EuroSCORE II creatinine clearance category allocation. Moreover, a large number of these patients were on pre-operative dialysis which is assessed independently of creatinine clearance by EuroSCORE II.
There were 960 records with a recorded serum creatinine value available. For this subset of records the results of the EuroSCORE II validation were very close to the former sensitivity analysis with an AUC 0.823 (95% CI 0.758 to 0.889) and O:E ratio 0.84. The Hosmer–Lemeshow test did not reject the null hypothesis of model calibration (p=0.420).
This study demonstrates that EuroSCORE II performs well overall in contemporary UK adult cardiac surgery. EuroSCORE II demonstrates good discrimination for all cardiac surgery, and the clinical sub-groups examined here. EuroSCORE II is also well calibrated for cardiac surgery overall and in a number of important clinical sub-groups including aortic valve replacement surgery. The logistic EuroSCORE is now obsolete and this study demonstrates that it is appropriate to use EuroSCORE II as a generic risk model for contemporary UK cardiac surgery. EuroSCORE II is, however, poorly calibrated for isolated CABG surgery in the UK. This limitation is important because isolated CABG accounts for more than half of UK cardiac surgery.
This study utilises a large clinical dataset, the inputs of which have been validated locally at each contributing hospital. The dataset is truly representative of UK cardiac surgery with all NHS hospitals and some private hospitals performing cardiac surgery included. As with most clinical studies, some data were missing. However, all original EuroSCORE risk factors had <5% missing data and the number of records excluded from the analysis because of missing in-hospital mortality data represents only 0.2% of the overall data available.
A limitation of this study is that three risk factors were not available as precisely defined in the EuroSCORE II manuscript. To address this issue, data imputations were made and these imputations were then checked using sensitivity analyses. For creatinine clearance other data available on renal function was used to assign patients to the appropriate category. The results of the overall analysis did not change across the renal sensitivity analyses demonstrating that the findings are robust. For the small number of records (approximately 5%) where there was potential doubt over the appropriate ejection fraction category allocation, records were allocated to the highest suitable category. This policy will have resulted in a minor under-estimation of mortality risk as will the fact that only poor mobility information secondary to neurological disease was available. If the EuroSCORE II were adopted as the model of choice for UK cardiac surgery, refinement to ensure complete data for all required risk factors would assist with future model validations.
Compared to the population used to derive EuroSCORE II, the UK population was older but had a lower proportion of females. Unfortunately a number of EuroSCORE II risk factors are not directly comparable between the EuroSCORE II population and this study population as they were not published in the EuroSCORE II manuscript. A potentially important limitation of EuroSCORE II is that only major cardiac procedures are included in the weight of intervention risk factor. As a result, patients who undergo cardiac surgery with concomitant thoracic or vascular surgery would not necessarily be allocated incremental risk above the baseline risk. It should also be noted that a number of other groups of patients which are likely to become more prominent, for example, those aged >95 years and patients undergoing trans-catheter aortic valve implantation were not included in either the EuroSCORE II dataset or the dataset used in this study.
This study represents a comprehensive assessment of EuroSCORE II performance. No single statistical test can be used to validate a risk prediction model, but various tests can together describe model performance, which in turn indicates how useful the model is. Therefore, in this study, calibration, discrimination and the clinical validity of EuroSCORE II have all been assessed. Calibration was assessed in detail using both calibration plot methodology and the Hosmer–Lemeshow test. Model discrimination has been assessed using the widely accepted method of AUC. Although model performance in some important clinical sub-groups has been assessed to determine the clinical validity of EuroSCORE II, an inevitable limitation is that there will be other important clinical sub-groups which have not been assessed as part of this study.
This study also represents the largest published validation of the EuroSCORE II to date. A validation of EuroSCORE II has been carried out for isolated CABG surgery in Finland. This validation on over 1000 patients also demonstrated that for isolated CABG, EuroSCORE II had good discrimination and a degree of over-prediction.20 Although the present study has demonstrated that EuroSCORE II is currently well calibrated overall for UK cardiac surgery, it is highly likely that this will change with time leading to the calibration drift which affected the original EuroSCORE models. Model discrimination appears to be relatively resistant to drift over time,14 however it is accurate calibration, that is, essential for clinical decision-making and risk adjusted governance analyses.21
Risk prediction models are used to inform patients and clinicians about the risks of surgery. When using models for this purpose it is vital that the clinician should (1) have formally derived the underlying prediction (2) know the extent to which their own performance is reflected in the prediction and (3) adjust the estimate up or down for important risk factors not captured in the prediction model. Models can also be used to guide clinical decision-making, with a recent example being the use of the Society of Thoracic Surgeons (STS) score to help determine patient suitability for aortic valve replacement in the PARTNER trial.22 Another important use of risk prediction models is for the risk adjustment of published governance analyses. Accurate risk prediction and clinical acceptance of the risk prediction model used for such analyses is vital if risk-averse clinical decisions are to be avoided.
EuroSCORE II was developed as a generic model for all cardiac surgery and this study has validated it primarily for this purpose. However, it is possible, that procedure specific risk models such as those developed by the STS,23–25 which allow the incorporation or risk factors specific to those individual procedures, may be more appropriate. A number of models have been developed specifically for aortic valve surgery but are not widely used.26 ,27 High risk cardiac surgery is an area where model performance has often been found to be poor and it may be that models specifically for high-risk surgery should be used in the future. Models which predict important post-operative complications such as stroke and renal failure could also play an important role in informed consent and clinical decision making. EuroSCORE II represents a necessary and timely update of the original EuroSCORE models. Although EuroSCORE II has some limitations, particularly its calibration for isolated CABG surgery, it is an appropriate generic risk model of choice for UK cardiac surgery.
The authors acknowledge all members of the Society for Cardiothoracic Surgery in Great Britain and Ireland who contribute data to the SCTS Database. We would also like to acknowledge the National institute for Cardiovacular Outcomes research (NICOR), UCL, for their role in supplying the data for this study. The National Institute for Clinical Outcomes Research (NICOR), UCL London, provided the data for this study.
Funding This research was partly funded by Heart Research UK Grant RG2583.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.