Appropriateness and outcomes: is it time to adopt appropriate use criteria outside of North America?
  1. Ricardo Fonseca,
  2. Thomas H Marwick
  1. Menzies Research Institute of Tasmania, Hobart, Tasmania, Australia
  1. Correspondence to Dr T H Marwick, Menzies Research Institute of Tasmania, 17 Liverpool St, Hobart, TAS 7000, Australia; tom.marwick{at}

Health jurisdictions across the world are troubled by the same topic—how the imaging component of the health budget can be spent wisely. Contributors to this problem are the ageing population, apparently limitless demand for imaging, and increasing sophistication and expense of imaging and treatment. Perhaps in no space is this more challenging than cardiovascular imaging, the growth of which has exceeded the overall growth of medical costs over the last decades.1 Part of the response to the disconnect between the growth in imaging and its value in North America has been the development of appropriate use criteria (AUC). Although the initial application of AUC were not directed towards cardiovascular imaging,2 the initial AUC for transthoracic and transoesophageal echocardiography were launched in 2007 and redefined in 2011,3 and similar situations have occurred for Single Photon Emission Computed Tomography (SPECT) and cardiac CT. The concept of appropriate use has had an extraordinary influence on the relationship between patients, physicians, administrators and insurance companies over the last decade.

AUC differ importantly from guidelines in that they are developed by consensus.4 A potential question, therefore, is whether they are necessarily correct. The study reported by Bhattacharyya et al,5 appears somewhat encouraging in this respect. Inappropriate stress echocardiograms had a low risk of cardiac events, similar to results found in Italy by Cortigiani et al,6 and in the USA by Aldweib et al.7 Indeed, the findings align with previous work by the author in relation to the clinical utility of AUC for valvular heart disease.8

Nonetheless, the application of AUC remains a primarily North American phenomenon. There are many aspects of medical care in the USA which are mysterious to foreigners. Are AUC merely a component of this difference, or is it time to declare victory for AUC and apply them more widely than in the USA? While the analysis of Bhattacharyya et al5 is useful, we are unconvinced for several reasons.

First, the lack of follow-up events does not itself define inappropriateness. We should expect three outcomes from testing—that the test should provide benefit, that it should reclassify risk, and that the risk should be alterable with intervention. While the link between a low risk for future cardiovascular events and unnecessary testing of patients with a low probability of coronary artery disease is readily understood, this is an imperfect example of inappropriate testing. The original interpretation of ‘appropriateness’ was a situation where the benefit of the test result exceeded the risk (to which we might add financial burden) associated with testing. A gratifyingly negative echocardiogram that reassures an anxious patient and confirms the correctness of a conservative strategy is perhaps the most ‘appropriate’ of scenarios. Conversely, a positive test for ischaemia that predicts an adverse event in a patient who is unsuitable for intervention is actually inappropriate. It is the reclassification of risk assessment that provides the most return in the investment of testing.

Second, the AUC are situation-specific but not necessarily evidence-based. This is especially problematic when the only difference between an appropriate or inappropriate test is the time of evaluation or the symptom status of the patient (table 1). Indeed, had Bhattacharyya et al5 studied other AUC than those for stress echocardiography, the results could have produced very different results. In some situations (asymptomatic non-severe valve disease), the test could be expected to have limited or no prognostic value, irrespective of timing and appropriateness. Conversely, the mandate of symptomatic criteria to define AUC in hypertensive heart disease may allow heart failure events to occur even after tests labelled as inappropriate.

Table 1

Comparison between some guidelines for the clinical application of echocardiography indications and their and corresponding AUC indications

Third, after almost a decade of this work, we see very little evidence of a real impact on the decline of ‘inappropriate’ tests despite continuous attention to the problem of inappropriate use. Analysis of the six most important studies for assessment of the 2011 AUC for stress echocardiography6 ,9–12 shows that the average of appropriate use is around 61%, and that there is no significant correlation between the rate of appropriateness and the enrolment years of the studies. Indeed, the work of Bhattacharyya et al5 provides a scenario for appropriate selection of stress echocardiography that is neither different nor encouraging. The study shows that 62.4% of the stress echocardiograms were appropriate, 28.4% were inappropriate and 9.2% were uncertain. Interestingly, these results are similar to the results obtained by Cortigiani et al,6 who found that 63.4% of SE were appropriate, 27.3% inappropriate and 9.3% uncertain under the same AUC, but with a noticeable difference: the SE evaluated by Cortigiani et al,6 were performed between January 2001 and December 2007 and the SE evaluated in the current study by Bhattacharyya et al, were performed between October 2010 and September 2011. These results should lead us to focus on the actual impact of AUC on health professionals throughout this time. While interventions directed towards interns and residents at the Massachusetts General Hospital,13 and the development of software to link AUC to ordering in the electronic medical record are promising,14 it remains uncertain as to whether the processes merely change the attribution of studies based upon these new rules. Certainly, the effects of teaching interventions have not been uniformly favourable, with a prominent negative study from an intervention based upon lectures and training.11

Fifth, the application of AUC is bedevilled by the reliability and reproducibility of the assessment of appropriateness. There is a chance of overestimation of inappropriate tests. And not all scenarios are covered by AUC.

Finally, several aspects of this particular study should caution us regarding more widespread adoption of AUC. The limited size of the group, as well as the restriction of the study to stress echocardiography, limits its external validity in other scenarios. Although the test indications and test results were prospectively collected at the time of stress echocardiography, it seems likely that the AUC classification was made retrospectively. Retrospective allocation of AUC is notoriously difficult.

There is no doubt that we need a method to reduce the overuse of cardiovascular imaging. The correlation of AUC with risk in the paper by Bhattacharyya et al5 is reassuring, but it does not provide evidence of incremental value on top of existing measures. For example, for stress echocardiography, we already have an evidence-based system that relates to coronary artery disease (CAD) risk and risk assessment. AUC appear to be a part of the imaging landscape in North America, but it may not be the solution to the challenges of imaging selection elsewhere. Perhaps the rest of us should contemplate a metric that better measures the information content of testing, in order to find the desired balance between clinical utility and reduction of health expenditure.


