Objectives: To study the ability of the logistic EuroSCORE to predict operative risk in contemporary cardiac surgery.
Design: Retrospective analysis of prospectively collected data.
Setting: All National Health Service centres undertaking adult cardiac surgery in northwest England.
Patients: All patients undergoing cardiac surgery between April 2002 and March 2004.
Main outcome measures: The predictive ability of the logistic EuroSCORE was assessed by analysing how well it discriminates between patients with differing observed risk by using the area under the receiver operating characteristic (ROC) curve and studying how well it is calibrated against observed in-hospital mortality. The performance of the EuroSCORE was examined in the following surgical subgroups: all cardiac surgery, isolated coronary artery surgery, isolated valve surgery, combined valve and coronary surgery, mitral valve surgery, aortic valve surgery and other surgery.
Results: 9995 patients underwent surgery. The discrimination of the logistic EuroSCORE was good with a ROC curve area of 0.79 for all cardiac surgery (range 0.71–0.79 in the subgroups). For all operations, the predicted mortality was 5.7% and observed mortality was 3.3%. The logistic EuroSCORE overpredicted observed mortality for all subgroups but by differing degrees (p = 0.02)
Conclusions: The logistic EuroSCORE is a reasonable overall predictor for contemporary cardiac surgery but overestimates observed mortality. Its accuracy at predicting risk in different surgical subgroups varies. The logistic EuroSCORE should be recalibrated before it is used to gain reassurance about outcomes. Caution should be exercised when using it to compare hospitals or surgeons with a different operative case mix.
Statistics from Altmetric.com
Interest in mortality outcomes in hospitals and for patients treated by individual surgeons over recent years has led to publication of named surgeon outcomes for coronary artery surgery and aortic valve replacement in the UK.1,2 The case mix is known to vary greatly between both hospitals and surgeons.3,4 Some form of adjustment for differences is believed to be important if meaningful comparisons are to be made.
The additive EuroSCORE is a widely used risk prediction algorithm for cardiac surgery.5 It allocates incremental risk points to 17 risk factors to give a score that reflects operative mortality. It has a reasonable overall predictive ability for both coronary and valve surgery but is known to overpredict risk compared with observed mortality in contemporary coronary artery surgery. It has also been shown to have poor predictive ability for patients at higher risk and it underpredicts observed mortality for combined valve and coronary surgery.3,4,6,7
More recently the logistic EuroSCORE has been used.8,9 This uses a more complex algorithm to derive risk from the same preoperative and operative risk factors. It is claimed to be a better predictor of operative risk than the additive model, but this has not been studied systematically. We have therefore analysed the predictive ability of the logistic EuroSCORE on a large contemporary cardiac surgery database in northwest England.
The North West Quality Improvement Programme in Cardiac Interventions is a regional consortium of all four NHS centres (Blackpool Victoria Hospital; The Cardiothoracic Centre, Liverpool; Manchester Royal Infirmary; South Manchester University Hospital) performing adult cardiac surgery and percutaneous coronary interventions in the north west of England. The goal of the group is to improve continuously the quality of care for patients receiving cardiac interventions by using a regionally based systems approach.10
Data were collected prospectively on all patients undergoing adult cardiac surgery between 1 April 2002 and 31 March 2004 in the north west of England. Each patient had a dataset collected, which included preoperative and operative variables, to enable a predicted mortality to be calculated. Data were collected in each institution and returned to a central source for analysis. Data were validated in each centre. Mortality was defined as any in-hospital death.
Design of the study
Two specific questions were addressed:
Is the logistic EuroSCORE a good overall predictor of operative mortality for cardiac surgery?
Does the logistic EuroSCORE predict observed mortality equally well in the various surgical subgroups?
All data were analysed with SAS for Windows V.8.2 (SAS Institute, Cary, North Carolina, USA). Categorical data are shown as a percentage. Predicted mortality was calculated for each patient by using the logistic EuroSCORE formula.9 If a patient factor necessary to calculate the EuroSCORE was missing in the record, that factor was assumed to be absent (occurred in less than 2% of cases).
We have assessed the predictive ability of the logistic EuroSCORE in two ways. Firstly, does the model discriminate appropriately between groups of patients with different observed risk? Secondly, does the numerical value produced by the model accurately reflect observed mortality (that is, is it calibrated appropriately)?
To assess the discriminatory ability of the logistic EuroSCORE we have used the area under the receiver operating characteristic (ROC) curve.11,12 An area of 0.5 reflects no discrimination and an area of 1.0 indicates a perfect predictor. Areas of greater than 0.7 are generally thought to be useful. To check correct calibration of the logistic EuroSCORE we have simply compared predicted with observed mortality.
In addition to studying the ability of the logistic EuroSCORE for all cardiac surgery we have studied its performance in the following surgical subgroups: isolated coronary artery bypass graft (CABG), isolated valve surgery, combined CABG and valve surgery, mitral valve with or without CABG, aortic valve with or without CABG, and other cardiac surgery. The other group includes any operation that is not either a valve or CABG operation or a combination of the two, and includes postinfarct ventricular septal defect repair, major aortic surgery and other miscellaneous operations. The observed to expected mortality ratios in these subgroups were compared by χ2 test.
Lastly, as one of the criticisms of the additive EuroSCORE is that it does not predict well in higher-risk patients,4,7 we have studied the performance of the logistic EuroSCORE in low and high-risk patients by dividing the groups into two at a cut off of additive EuroSCORE of 6 or more.4
In total 9995 patients underwent cardiac surgery during the study period. Table 1 shows the breakdown by operative category. The “other” cardiac surgery group contained 24 ventricular septal defects, 247 major aortic diseases and 472 miscellaneous other operations.
Table 2 shows the predictive ability of the logistic EuroSCORE in the different categories. The area under the ROC curve was satisfactory for all types of cardiac surgery, ranging from 0.71 to 0.79.
The logistic EuroSCORE predicted a mortality that was higher than observed for all operative subgroups. For all cardiac surgery the predicted mortality was 5.7% compared with an observed mortality of 3.3%, meaning that the logistic EuroSCORE needs to be calibrated by a factor of 0.58 to give an accurate representation of operative risk for this population. The calibration required between subgroups varied significantly, ranging from 0.44 for isolated valve surgery to 0.76 for other cardiac surgery (p = 0.02).
Table 3 gives the predictive ability of the logistic EuroSCORE in low and high-risk patients. For all types of cardiac surgery the discriminatory ability was equally good in both groups and the logistic EuroSCORE overpredicted observed mortality, but it did so by a similar degree for both low and high-risk patients. The findings were similar for isolated CABG.
The logistic EuroSCORE is a reasonable overall predictor for contemporary cardiac surgery but overestimates observed mortality. The model overpredicts observed mortality to different degrees in the various subgroups. Caution should be exercised when comparing mortality directly against the logistic EuroSCORE or when using the model to compare outcomes between hospitals or surgeons with a differing case mix.
Strengths and weaknesses of the study
Scrutiny on measuring mortality outcomes in cardiac surgery of hospitals and surgeons is increasing.1–3 To allow fair comparisons to be made it is important to adjust observed mortality for predicted risk, and this should be done with a valid risk adjustment model. We have assessed the predictive ability of the logistic EuroSCORE on a large contemporary cardiac surgery database, to which four hospitals and 25 surgeons contribute. The data have been validated locally and have the confidence of clinicians but have not been validated externally, which is a weakness of our study. The study sample is large and therefore we believe our findings are robust.
We have shown that the logistic EuroSCORE overpredicts observed mortality for all operative groups. One possible reason for this is that the mortality after surgery in our region is low and that, rather than our findings suggesting poor performance of the risk model, they suggest good performance of the hospitals in northwest England. However, the reported UK mortality for isolated CABG surgery, isolated valve surgery and combined valve and coronary surgery in 2003 is 2.0%, 4.3% and 7.6%, respectively3; these are similar to the rates reported here. Our non-adjusted surgical results are indicative of the national picture, and there is no reason to suppose that we are achieving mortality figures in line with national data while operating on a significantly higher-risk population than elsewhere in the UK.
Strengths and weaknesses of the study compared with other studies
There are several reports on the performance of risk tools in cardiac surgery. The first risk prediction model to find general use was the Parsonnet score.13 This was used by hospitals and surgeons to compare risk-adjusted outcomes with a benchmark,14 but it has now been shown to significantly overpredict observed mortality.3,15 The additive EuroSCORE was first introduced in 1999 and has been studied widely.5,7 It is now clear that it is not accurate at predicting mortality in higher-risk patients and it underestimates observed mortality for combined valve and coronary surgery.4,6,7
To respond to these limitations the logistic EuroSCORE was introduced in 2003 and was claimed to be a better predictor of mortality for high-risk patients than the additive model.8,9 It has been shown to be a reasonable predictor of risk (and better than the additive score for high-risk patients) in both US and Italian studies,16,17 but the US study suggested recalibration may be needed. The model has been shown to overpredict observed risk in a single-centre Australian report.18 The logistic EuroSCORE has also been reported to have limitations in predicting outcome for thoracic aortic surgery and a modified score has been suggested.19 Our large multicentre study confirms the previous claims that the logistic EuroSCORE has good predictive ability in both low- and high-risk groups of patients. However, we have also shown clearly that the logistic EuroSCORE overpredicts observed mortality in UK practice and we have given useful contemporary calibration figures for the various operative subgroups, which we feel will be useful for benchmarking outcomes.
Meaning of the study
Our findings should give cause for caution when using the logistic EuroSCORE as a risk adjustment model for comparing institutional or individual surgeon outcomes. Our observation that the logistic EuroSCORE overestimates observed mortality for isolated coronary artery surgery by a factor of two means that it would be easy to gain false reassurance by comparing observed mortality with that predicted by the algorithm. In fact, to be better than average, you would have to obtain a mortality that was less than half that predicted by the logistic EuroSCORE.
The finding that the logistic EuroSCORE overpredicts mortality is not surprising. The model was developed on data in the late 1990s and surgical practice has changed since then with increasing numbers of patients with coronary disease treated percutaneously. Advances in surgery, anaesthetic and intensive care have led to overall reductions in surgical mortality, despite higher predicted operative mortality.3 A model that was calibrated accurately in the 1990s is therefore unlikely to remain accurate. The present study gives appropriate contemporary calibration factors for the model for surgery in northwest England.
Initiatives to benchmark surgical outcomes and performance in the UK have so far focused on CABG surgery and to a lesser extent aortic valve replacement.1–3 The proportion of surgeons’ practice that is high risk has been shown to vary greatly between surgeons, and valid comparisons can be made only after adjustment for risk by an appropriate model.4 Cardiac surgical practice is becoming increasingly specialised, with some surgeons developing major proportions of their workload, for example, in major aortic surgery or mitral valve repair. The benchmark operation of isolated CABG may be only a relatively small amount of a hospital’s or surgeon’s workload and to rigorously compare outcomes it seems appropriate to include all cardiac surgery undertaken. Our findings of significant differences in calibration for the logistic EuroSCORE in the different operative groups means that the model is not suitable for such purposes without significant modifications. However, the calibration factors we have reported may prove to be useful in allowing hospitals or surgeons to identify risk-adjusted benchmarks against our data.
Unanswered questions and future research
Our study has shown some strengths and weaknesses of the logistic EuroSCORE and should prove useful to clinicians who are interested in studying comparative outcomes. Although we have shown variation in the predictive ability of the model in various subgroups, we have not looked at highly specialised areas such as aortic surgery or mitral repair surgery. The logistic EuroSCORE, once calibrated appropriately, may be of use in these groups or specific risk prediction models may be required for these operations. Although we have shown limitations of the model to adjust for risk and believe this may have implications when comparing outcomes between surgeons or hospitals with a differing case mix, we do not know how important this effect will be, and it would warrant further investigation.
This study was conducted on behalf of the North West Quality Improvement Programme in Cardiac Interventions. The consultant surgeons involved are John Au, Ben Bridgewater, Colin Campbell, John Carey, John Chalmers, Walid Dhimis, Abdul Deiraniya, Andrew Duncan, Brian Fabri, Elaine Griffiths, Geir Grotte, Ragheb Hasan, Tim Hooper, Mark Jones, Daniel Keenan, Neeraj Mediratta, Russell Millner, Nick Odom, Brian Prendergast, Mark Pullan, Abbas Rashid, Franco Sogliani, Paul Waterworth and Nizar Yonan. The following surgeons left the collaboration during the study period: Narinda Bhatnagar, Albert Fagan, Bob Lawson, Udin Nkere, Peter O’Keefe, Richard Page, Ian Weir and David Sharpe.
We acknowledge the assistance of the audit officers working in each centre for their hard work in collecting and validating the data.
Published Online First 17 March 2006
Competing interests: BB is a Society of Cardiothoracic Surgeons of GB and Ireland representative on the joint Society of Cardiothoracic Surgeons, Healthcare commission, Department of Health group defining national cardiac surgical audit. BB, AG, GJG, BF and MJ are all members of the steering group of the North West Regional Quality Improvement Programme in Cardiac Interventions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.