Statistics from Altmetric.com
Owing to the unexpected results of some studies, such as CAST (cardiac arrhythmia suppression trial)1 and the World Health Organization clofibrate trial,2 or the studies with cAMP dependent positive inotropic agents, the results of randomised controlled trials are taken to represent the gold standard for therapeutic decisions (“evidence based medicine”). The information gain increases as one proceeds from anecdotal case reports over series of observations, to case–control studies, cohort studies, small randomised controlled trials and their meta-analyses (the results of which are not confirmed by randomised controlled trials in up to 30%3), large randomised controlled trials, and finally to careful meta-analyses of such trials (Cochrane criteria). Surrogate end points based on the expectation of a beneficial effect may lead to erroneous conclusions; the gold standard involves primary, hard end points—predominantly prolongation of life—which may be quality adjusted.
However, even where the results of large randomised controlled trials show a significant increase in survival, the clinician is still confronted with difficulties, especially with diseases of unknown aetiology and pathogenesis. If the causes of anaemia were not fully understood, a positive result from a study of vitamin B-12 treatment could imply that all anaemic patients should be treated with vitamin B-12.4 Such generalised recommendations are common practice: in the postinfarction period, the benefits of aspirin, β blockers, angiotensin converting enzyme (ACE) inhibitors, statins, and rehabilitation are clearly documented. Each treatment reduces mortality by about 20%. By simple addition, a survival rate of 100% or immortality would result; more correctly a survival rate of 67% can be expected at best. An exact evaluation would have to be based on a comparison of the five different treatments and all their combinations. This is next to impossible, as more than 450 studies would have to be performed.
For diseases with a low event rate, large study populations must be recruited, with the disadvantage of heterogeneity. The patients must be followed for several years, with a consequent increase in crossover and dropout rate. It is for these reasons that in chronic stable angina, for example, no megatrials have addressed the question of survival with and without antianginal drug treatment.
Diseases with a low event rate but with rapidly developing therapeutic options pose additional problems. Under these conditions any trial is limited in two major ways: either the protocol is readily adjustable for technical progress, or the original protocol is maintained. In the first case the results of such a “dirty” study cannot be adequately interpreted; in the second case the results are reliable but outdated. This dilemma is evident in the selection of patients with chronic stable angina for medical treatment, coronary artery bypass grafting (CABG), or percutaneous transluminal coronary angioplasty (PTCA). The randomised controlled trials comparing CABG with medical treatment were performed in the late 70s and the early 80s. At that time internal mammary artery grafts were almost never used. Furthermore, medical treatment was not standardised, and there was insufficient use of aspirin and β blockers. The studies comparing PTCA with medical treatment or CABG are fraught with similar problems. In these trials stents were either not available (for example, in the ACME (angioplasty compared with medical treatment) trial5) or were used in a minority of patients (< 10% in the RITA 2 (randomised intervention treatment of angina) trial6). In none of these trials were platelet glycoprotein IIb/IIIa inhibitors given. As these studies do not reflect present standards, how should the results be used for patients being cared for now?
The clinician needs to take account of the study design: the less stringent the inclusion criteria, the greater the number of patients who will be recruited in a given time. This tends to produce positive results. However, because of the heterogeneity of any study population it is likely that, while a majority will benefit, for a minority the new treatment may be ineffective or even detrimental. Furthermore, any generalised application of clinical trial results imposes increased health care expenditure and increases the profits of commercial sponsors. If a more homogeneous study population is selected (for example, in the CASS (coronary artery bypass surgery survival) study,7 of 16 262 patients assessed only 780 were randomised—that is, less than 4%) not many patients will be recruited and strictly speaking the results are only valid for the subset of patients who meet the inclusion criteria. Hence for a majority of patients the outcome of such a trial is irrelevant. This argument may apply to several studies and could explain why the mortality, for example, in myocardial infarction, is greater in field studies or in the general population than in randomised controlled trials with carefully predefined selection criteria.8
Many study drugs are used at high dosage to achieve a convincing therapeutic effect. However, in day to day practice the drug is often used in much lower dosage to avoid unwanted side effects, though still with an expectation of benefit. This may be unwarranted.
In therapeutic studies with a positive result, it is rarely clear how long the treatment should be given. It is nowadays common practice to express the therapeutic effect as per cent reduction in event rate. Such a result is more impressive than the actual number of events prevented. This calculation, however, is mathematically dubious (calculation of [%] from [%]). Taking into account the fact that in many studies the absolute benefit is rather small and that nobody is immortal, the documented therapeutic effect is bound to become insignificant with time. According to the concept of evidence based medicine, the treatment would then have to be stopped. Almost no data are available on when this should happen.
Further problems are related to financing. Government funds are limited. Thus studies may be devoted less to scientific innovation than to problems of health care, such as economics. In studies financed by commercial sponsors it seems realistic to assume that marketing aspects are not without influence on the study design. If the objective of the study is to obtain a positive result, the new treatment may not be compared with the best available alternative but rather with a conventional regime that is likely to be inferior. A positive result may further be achieved by enrolling patients in whom a pronounced difference is expected. If a negative study result is desired—for example, to demonstrate equivalence of two treatments such as β blockers versus calcium channel blockers in chronic stable angina—small studies with a limited power are conducted, so that even large differences could not be detected.
By convention a level of significance of 5% is generally accepted to reject the zero hypothesis. However, it has to be assumed that the statistical evaluation is influenced by a priori evaluation, so that in published studies the number of “false positive” results is suggested to exceed the theoretically expected 5%.9 For secondary end points, the probability of false positive results likewise exceeds the 5% level.10 Hence, in published studies the number of false positive results may well exceed the theoretically expected 5%. The power is likewise important, as indicated above, for studies in which a negative result is desired.
Even publication practices contribute to the problems. Compared with a negative result, a positive result is much more likely to be accepted for publication, to be published in an English language journal, and to be the subject of multiple publications. Positive results are therefore more widely distributed than negative ones. This imposes a severe bias on meta-analyses. The evaluation of published data can be based on different criteria. “Evidence based” means that the data are carefully analysed according to strict rules, such as the Cochrane criteria. These results achieve a high degree of reliability. “Science based” describes the usual scientific approach, and a subjective component cannot be excluded. If the published data are not homogeneous, their evaluation may be “consensus based”. In consensus conferences, “experts” form an opinion which is often taken for the truth. Both the selection of the experts and their approach in coming to their conclusions are subject to personal bias. Most if not all guidelines are consensus based. A source of error common to all three modes of analysis is the reduced probability of negative trials being published, particularly in English language journals.
Evidence based medicine has been advocated for reducing health care costs. According to an estimate by the Cochrane Collaboration, 30% of all medical actions are not evidence based and are therefore dispensable.11 However, there are examples that suggest that the contrary may even occur. The WOSCOP (west of Scotland coronary prevention) study12 and the AFCAPS/TeXCAPS (Air Force/Texas coronary altherosclerosis prevention) study13showed convincingly that statins are effective in the primary prevention of coronary heart disease by reducing cardiovascular mortality. Treatment of all potential beneficiaries, however, would be prohibitively expensive for any national health care system.14
Most studies are sponsored by industry and are based on the premise that the addition of drugs or procedures on top of established treatment improves outcome.15 When such results are incorporated in guidelines for patient care—according to the rules of evidence based medicine—the result is higher costs. Despite the documented beneficial effect of platelet glycoprotein IIb/IIIa inhibitors in patients undergoing PTCA or suffering from an acute coronary syndrome,16-19 treatment of all candidate patients would be very expensive and would exceed the available financial resources.
Because of these limitations, most of our therapeutic decisions are not entirely the result of evidence based medicine. In daily routine practice, empirical criteria—including psychological, social, economic, medical, and technical factors—are at least equally important. The choice of treatment should be tailored to each patient and be based on both objective and subjective criteria—that is, evidence based and experience based medicine.
Undoubtedly evidence based medicine is the gold standard for modern medicine. The results, however, should be applied in patient care with careful reflection. Otherwise evidence based medicine may acquire the same status for the doctor as a lamp post for a drunk: it gives more support than enlightenment.
I thank Priv-Doz Dr Christlieb Haller, Heidelberg, for critically reviewing this paper.