Objective To compare the ability of four risk models to predict operative mortality after coronary artery bypass graft surgery (CABG) in the United Kingdom.
Design Prospective study.
Setting Two cardiothoracic centres in the United Kingdom.
Subjects 1774 patients having CABG.
Main outcome measures Risk factors were recorded for all patients, along with in-hospital mortality. Predicted mortality was derived from the American Society of Thoracic Surgeons (STS) risk program, Ontario Province risk score (PACCN), Parsonnet score, and the UK Society of Cardiothoracic Surgeons risk algorithm.
Results There were significant differences (p < 0.05) between the British and American populations from which the STS risk algorithm was derived with respect to most variables. The observed mortality in the British population was 3.7% (65 of 1774). The mean pre- dicted mortality by STS score, PACCN, Parsonnet score, and UK algorithms were 1.1%, 1.6%, 4.6%, and 4.7% respectively. The overall predictive ability of the models as measured by the area under the receiver operating characteristic curve were 0.64, 0.60, 0.73, and 0.75, respectively.
Conclusions There are differences between the British and American populations for CABG and the North American algorithms are not useful for predicting mortality in the United Kingdom. The UK Society of Cardiothoracic Surgeons algorithm is the best of the models tested but still only has limited predictive ability. Great care must be exercised when using methods of this type for comparisons of units and surgeons.
- cardiac surgery
- risk stratification
Statistics from Altmetric.com
In the modern era, results of medical care given by different institutions and doctors are increasingly scrutinised, and the production of hospital league tables is now widespread. Such league tables usually fail to account for case mix and severity, and great caution should be exercised in their interpretation. The development of tools to allow results from different hospitals and surgeons to be compared in a meaningful way is obviously an important goal for internal and external audit.
Operative mortality is an indicator of the quality of cardiac surgery. Comparing different institutions or surgeons on the basis of crude mortality figures may be misleading as mortality is affected by various preoperative characteristics of the patients,1 and before comparisons can be made it is important to adjust the mortality by accounting for these characteristics. The ideal risk stratification model should be easy to implement, objective, an accurate predictor of observed mortality, and in widespread use.
Various models have been developed for use in cardiac surgery, and the first to become popular was the Parsonnet risk stratification system2 which was developed in the USA in the 1980s. The model allocates additive predicted mortality percentage points to 14 patient risk factors to give a “Parsonnet score,” which is supposed to be indicative of the percent mortality for each patient and has been used to categorise patients into good risk (predicted mortality 0% to 4%), fair risk (5% to 9%), poor risk (10% to 14%), high risk (15% to 19%), and extremely high risk (> 20%). It has been shown to be applicable to British cardiac surgical practice.3 However, it has been criticised for the nature of its statistical derivation,4 5 it systematically overestimates mortality, particularly for high risk patients, and its scoring system is quite subjective, again especially in the high risk group. It also omits many surgeons’ “favourite” risk factors, such as the number of coronary vessels diseased, urgency of operation, and presence of chronic obstructive pulmonary disease.6
More recently other models have been developed. The Society of Thoracic Surgeons (STS) in the USA has produced an algorithm to predict operative mortality for patients undergoing coronary artery surgery alone.7 It is based on a very large patient population and has been developed using the Bayes theorem. It has been shown to be a good predictor of observed mortality in the USA.7 8 In Canada, a risk index has been developed to predict mortality, prolonged stay in the intensive care unit, and prolonged hospital stay9 in all patients undergoing cardiac surgery. In the United Kingdom a model has recently been produced to predict mortality following coronary artery bypass graft surgery (CABG).10We have studied the ability of these models to predict mortality after CABG in the United Kingdom.
We have studied all patients undergoing CABG alone in two centres: Wythenshawe Hospital, Manchester, and The Royal Brompton Hospital, London.
The patients at Wythenshawe Hospital were recruited between February 1995 and January 1996. Risk factors were recorded in a computerised database by the operating surgeon at the time of surgery, from data collected on admission on structured patient clerking sheets. Data were checked for completeness and omitted or erroneous data were completed by reference to hospital notes. Once a complete dataset was obtained for each patient there was no further verification.
ROYAL BROMPTON HOSPITAL
The patients from the Royal Brompton Hospital were those operated upon between April 1994 and March 1995. Data were collected on structured clerking sheets by junior medical staff and transcribed onto a computerised data base by clerical staff. Once again, there was no subsequent verification once a complete dataset had been obtained.
PATIENT RISK FACTORS
The definitions used for some of the risk factors are shown in table 1. These definitions are those given by the Society of Thoracic Surgeons for the USA national database.11 Renal failure at Wythenshawe Hospital was defined as plasma creatinine greater than 140 μmol/l or dialysis dependent, and at The Royal Brompton Hospital as plasma creatinine greater than 150 μmol/l, urea greater than 15 mmol/l, or requiring dialysis. For ejection fraction a single estimated figure was collected at Wythenshawe, and at the Royal Brompton it was graded as good (50% or more), fair (30 to 49%), or poor (less than 30%).
The Parsonnet score was derived from the risk factors as described previously and shown in table 2.2 For the purposes of this study we have automatically given an additional score of 10 points for a catastrophic state, rather than allowing between 10 and 50 as suggested in the original Parsonnet system, to decrease subjectivity on the higher risk patients. To calculate the STS risk score the data were imported into the commercially available STS risk stratification software (Summit Medical Systems, Nice, France) and a risk prediction for each patient was calculated. The PACCN score was derived as shown in table 3,9 and the UK Society of Cardiothoracic Surgeons score (UK national score), as published using the Bayes theorem, is shown in table4.10
Mortality was recorded from patient and hospital records. Operative mortality was defined as death temporally or causally related to surgery (death within 30 days of operation or in the same hospital admission as operation, regardless of cause). The cause of death was recorded from patient records or necropsy examination when available.
The incidence of risk factors between the two centres in the United Kingdom, and between the combined Wythenshawe/Brompton population and those reported by the STS in the USA from which the STS risk prediction algorithm was generated, were compared by χ2 test. Ninety five per cent confidence limits for the British and American populations are also given, along with the relative risk (RR). The observed mortality in our sample was compared with the overall mortality in the United Kingdom and the USA, again by χ2 test. The distributions of the risk predictions were not normally distributed and so the mean scores, median scores, and ranges are given. Receiver operating characteristic (ROC) curves were plotted for the predictive value of the different models and the area under the ROC curves was calculated as an index of the overall predictive ability of the models.
PREOPERATIVE PATIENT CHARACTERISTICS
The incidence of risk factors at Wythenshawe Hospital and the Royal Brompton Hospital is shown in table 5, and those for the pooled United Kingdom population and those reported in the USA by the STS are shown in table 6. The average age of the British population was 60.6 years (range 32 to 83 years); 82% were male and 18% female. There were significant differences between the two British centres with respect to age greater than 70 years, smoking history, renal failure, chronic obstructive pulmonary disease, previous cerebrovascular accident, unstable angina, and redo surgery. There were significant differences between the British and the American populations with respect to all variables except renal failure and redo surgery. Morbid obesity (relative risk (RR) 7.2), ejection fraction less than 50% (RR 3), unstable angina (RR 1.9), age greater than 70 years (RR 1.8), diabetes (RR 1.5), female sex (RR 1.4), hypertension (RR 1.3), and non-elective surgery (RR 1.2) were more prevalent in the USA. Intravenous nitrates (RR 0.1), previous cerebrovascular accident (RR 0.4), left main stem disease (RR 0.5), smoking history (RR 0.6), recent percutaneous transluminal coronary angioplasty (PTCA) (RR 0.7), and chronic obstructive pulmonary disease (RR 0.7) were more common in the United Kingdom.
Documentation of operative mortality was exhaustive and we have obtained alive/dead status on all patients. The operative mortality was 65 of 1774 patients (3.7%), 39 of 995 at Wythenshawe (3.9%) and 26 of 779 at the Royal Brompton (3.3%). This difference was not significant. The overall mortality was similar to the 3.5% reported in the United Kingdom cardiac surgical register 1994/95 and slightly higher than 2.8% for the STS 1991-1993 (p < 0.05). The commonest causes of death were cardiac in 68%, respiratory in 15%, and sepsis 5%. The causes of death in the British and American populations are shown in table 7. Respiratory deaths were more common in the British population (p < 0.05), and multisystem failure more common in the USA (p < 0.05).
The mean overall predicted mortality was 1.07% (median 0.9, range 0.3% to 11.5%) by the STS algorithm, 1.6% (median 1.5, range 0% to 9%) by the PACCN score, 4.6% (median 3, range 0% to 45%) by Parsonnet score, and 4.7% (median 4.3%, range 0.5% to 76%) by the UK national score.
The area under the ROC curves were 0.64 for the STS algorithm, 0.60 for the PACCN score, 0.73 for the Parsonnet score, and 0.75 for the UK national algorithm. An area of 1 suggests a perfect predictor, and a value of 0.5 is a test of no value. Areas of between 0.5 and 0.7 represent a low accuracy, between 0.7 and 0.9 are useful for some purposes, and higher values represent a high accuracy.12
Developing an appropriate tool for predicting risk is an important goal, because only then can operative results be viewed in the context of case mix. Risk stratified mortality data are useful for internal audit processes to allow surgeons to review their results compared with their peers to ensure that the quality of the service they provide is satisfactory. It also allows external audit to be performed in a meaningful way, enabling units to be compared by purchasers or other interested parties. In the increasingly aggressive medicolegal environment, accurate collection of this type of data analysed in a responsible way also provides protection for clinicians.
Before a risk prediction algorithm is used for these aims, the dataset to which it is applied should be validated and trusted, the risk prediction tool should be shown to be an accurate predictor of operative mortality, the system should only be used on the types of practice on which it has been validated, comparisons between populations should not be drawn if there are great differences in case mix, and conclusions should only be drawn after an intelligent and responsible analysis with an informed knowledge of all the factors.
The data presented here have been collected from two surgical units, one in the southeast and one in the northwest of England. Together we perform about 2800 open heart procedures each year and each department supports a busy interventional cardiology programme. We have no reason to think that our case mix is atypical of busy regional units and our observed mortality for coronary artery surgery in the 1774 patients reported here is 3.7%, which is similar to that reported in the United Kingdom Cardiac Surgical Register in 1994/95.13 There was no difference in the overall mortality between the two units. All of the patients in the period studied have been risk stratified, and documentation of 30 day mortality has been exhaustive and complete.
There were some differences in the incidence of patient risk factors between the two centres used in this study: the incidence of age greater than 70, redo surgery, and unstable angina was higher at the Brompton Hospital, but chronic obstructive pulmonary disease, current smoking history, and renal failure were more common at Wythenshawe Hospital, reflecting the nature of the referral patterns and the socioeconomic makeup of the different regions.
There are notable differences between the pooled British patients and those in the USA from which the STS model was derived. Some of these differences are conflicting: for example, the incidence of non-elective surgery was higher in the USA (20% v 16%), but the incidence of intravenous nitrate use was higher in the United Kingdom (1% v 9%). This paradox is a result of the definitions used and the different nature of health care provision in the two systems. The definitions of salvage, emergency, urgent, and elective operations shown in table 1 were developed for American practice and for the purposes of this study we have used them here. Because of the pressures on the NHS, we almost never operate within 24 hours of referral, unless it is an “emergency” operation, and so a patient who has coronary angiography and is found to have a tight left main stem stenosis and a blocked right coronary artery may well receive an operation the following day in the USA and so be an urgent case, when they would probably wait for several days in our hospitals and so be an elective case according to the definitions used. This would cause an underestimation of the predicted mortality according to the STS model in the United Kingdom.
The definitions of some of the risk factors are objective and easily compared, such as renal failure, but others are more difficult. For example chronic obstructive pulmonary disease is defined as any case where drugs are given for chronic obstructive lung disease or where the FEV1 (forced expiratory volume in one second) is less than 75% of the predicted value. This seems to be an important risk factor in our practice, because 15% of our mortality was due to respiratory causes, compared to 7% in the USA. According to this definition, the incidence of chronic obstructive pulmonary disease in the United Kingdom is 4.5%, against 3.2% in the USA, but we do not measure lung function preoperatively and so only pick up those patients with a history or who are on drug treatment. This underdetection of the incidence of chronic obstructive pulmonary disease leads to an underprediction of risk and may be another reason why the STS model significantly underestimates our observed mortality. Some of the differences between the United Kingdom and the USA are statistically large but clinically small (for example, non-elective operations: 16.7% in the United Kingdom, 20.1% in the USA, RR 1.2) and are driven by the huge numbers in the groups, but they are important as the incidence of the risk factors in the population and their associations with mortality in that group are the way in which the risk models are developed. Any difference in the populations casts a doubt on the wisdom of using a risk model produced from one population for predicting risk in another statistically different population. Other risk factors also show a marked difference between the two populations (for example, morbid obesity 3.2% in the United Kingdom, 23% in the USA, RR 7.2)
Assessing the accuracy of a multivariate prediction tool is not straightforward. The predictive equations provide a probability of mortality of between 0% and 100% for each patient, based on the incidence of risk factors, but each patient either dies or does not and so direct comparisons of predicted and observed outcome for the individual patient are not a useful measure of the efficacy of the predictive tool. The area under the ROC curve is believed to be a more appropriate statistical measure of the ability of a model to predict what it intends to. It is a plot of sensitivity versus 1− specificity, and the area under the curve is a useful summary measure of the diagnostic accuracy of the tool. An area of 1 suggests a perfect predictor, and a value of 0.5 is a test of no value. Areas of between 0.5 and around 0.7 represent a rather low accuracy—the true positive proportion is not much greater than the false positive proportion. Values between 0.7 and 0.9 are useful for some purposes, and higher values represent a high accuracy.12 16 ROC curves have now been widely used for evaluating the accuracy of risk prediction models.9 14 17 18
The overall predictive ability of the STS model as indicated by the area under the ROC curve is 0.64, indicating it is not useful for British practice. The PACCN score was derived from analysis of 6213 patients undergoing cardiac surgery in Ontario province, Canada, in 1991.9 The model was developed using multivariate analysis to determine important risk factors which were then combined into an additive model. The model was then validated on a test set of 6885 patients from the same region in 1992 and shown to predict mortality, very long intensive care stay, and very long in postoperative stay, with areas under the ROC curve of 0.75, 0.67, and 0.71 respectively. This risk index does not include any additional score for comorbid, non-cardiac disease, which may account for its poor performance in predicting mortality in our analysis. The area under the ROC curve of 6.0 was the worst of all the models tested.
As has been reported previously,3 the Parsonnet score provides a reasonable predictor of operative death for British practice, but it consistently overpredicts observed mortality, particularly for the higher risk groups. The Parsonnet scores ranged from 0% to 45%, with a mean of 4.7% and a median of 3, and while the scores are not normally distributed the mean Parsonnet score for the whole group of 4.7% gives some idea of case mix and enables the overall mortality of 3.7% to be viewed in that context. As well as overestimating risk in contemporary clinical practice, the Parsonnet score has been criticised for being subjective at higher levels of risk. We have tried to minimise this in this study by simply including an additional score of 10 points for any patient in a catastrophic state, rather than allowing a range of 10 to 50 points as suggested in the original model.2 The ROC curve analysis suggests that the Parsonnet score has useful but limited predictive ability in this population.
The UK national score10 is still being developed but was derived from a population of 4159 patients undergoing coronary artery surgery alone from three centres. The risk data included in the study have not yet been validated fully. The model was developed using the Bayes theorem, as shown in table 4. This algorithm has not yet been validated on an independent population, but the predicted mortality for our patients of 4.7% again overestimates our observed mortality of 3.7%, though the overall predictive ability as given by an area under the ROC curve of 0.75 was the best of any of the models tested.
Other models exist for predicting mortality, such as that generated in New England14 or the Veterans Affairs Cardiac Surgery Consultants’ Committee,15 but we have not studied their predictive ability as we do not routinely collect the data required for their models.
The predictive models studied have areas under the ROC curve ranging from 0.6 to 0.75. It may be impossible to produce a model with a curve area much higher than this, as some aspects of mortality will always be related to risk factors not included in the model (for example, the quality of the distal vessels) or due to chance happenings not related to preoperative patient characteristics (such as surgical error).16
Development of an appropriate risk stratification model is an important goal in cardiac surgery, and an ideal tool should be easy to implement, objective, an accurate predictor of observed mortality, and in widespread use, allowing comparison between surgeons and units to be made readily. No existing tool fulfils all of these criteria in the United Kingdom, but the Parsonnet score and preliminary version of the UK national scores seem to be the best tools available at present. Even these only have limited ability to predict observed outcome, as shown by the area under the ROC curve of 0.73 and 0.75 respectively, and great care should still be exercised when using these models.
We would like to thank Maureen Silcock and the staff of the clinical audit department at Wythenshawe for their help with this study, and Philip Kimberley for collating the data from the Royal Brompton. We are grateful to all the consultant surgeons at both institutions for allowing us to study their patients and would also like to thank Geoff Corner for his assistance with the surgical database and Brian Farragher for statistical advice. We would also like to acknowledge the help of Professor Tom Treasure in the preparation of this manuscript. The work was supported in part by a grant from the Northwest Regional Health Authority.