Article Text
Abstract
Clinical trials traditionally aim to show a new treatment is superior to placebo or standard treatment, that is, superiority trials. There is an increasing number of trials demonstrating a new treatment is non-inferior to standard treatment. The hypotheses, design and interpretation of non-inferiority trials are different to superiority trials. Non-inferiority trials are designed with the notion that the new treatment offers advantages over standard treatment in certain important aspects. The non-inferior margin is a predetermined margin of difference between the new and standard treatment that is considered acceptable or tolerable for the new treatment to be considered ‘similar’ or ‘not worse’. Both relative difference and absolute difference methods can be used to define the non-inferior margin. Sequential testing for non-inferiority and superiority is often performed. Non-inferiority trials may be necessary in situations where it is no longer ethical to test any new treatment against placebo. There are inherent assumptions in non-inferiority trials which may not be correct and which are not being tested. Successive non-inferiority trials may introduce less and less effective treatments even though these treatments may have been shown to be non-inferior. Furthermore, poor quality trials favour non-inferior results. Intention-to-treat analysis, the preferred way to analyse randomised trials, may favour non-inferiority. Both intention-to-treat and per-protocol analyses should be recommended in non-inferiority trials. Clinicians should be aware of the pitfalls of non-inferiority trials and not accept non-inferiority on face value. The focus should not be on the p values but on the effect size and confidence limits.
- Biostatistics
- Study design
- RESEARCH APPROACHES
- Heart Disease
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Clinical trials have flourished in medicine. Conducting clinical trials is one of the most important tools, allowing proper evaluation of treatment options. The results of clinical trials often form the backbone of practice guidelines which have permeated all subspecialties of medicine. With the potential to alter clinical practice, clinical trials have significant clinical, professional and public interests.
Superiority trials
Traditionally, investigators aim to demonstrate a new treatment is better than standard treatment or placebo in clinical trials. There have been landmark trials evaluating treatment of a wide range of conditions. The Stroke Prevention in Atrial Fibrillation (SPAF) trial1 demonstrated that warfarin was more effective than placebo in stroke prevention in non-valvular atrial fibrillation (NVAF). The Global Utilization of Streptokinase and Tissue Plasminogen Activator for Occluded Coronary Arteries (GUSTO) trial2 showed that accelerated tissue plasminogen activator (TPA) offered survival benefits over standard-dose TPA and streptokinase in ST-elevation myocardial infarction (STEMI). The Clopidogrel in Unstable Angina to Prevent Recurrent Events (CURE) trial3 demonstrated the superiority of clopidogrel and aspirin over aspirin alone in non-STEMI. More recently, Further Cardiovascular Outcomes Research With PCSK9 Inhibition in Subjects With Elevated Risk (FOURIER) trial4 showed that evolocumab was better than placebo in reducing major adverse cardiovascular events in patients with cardiovascular diseases.
The null hypothesis in superiority trials is that the new treatment is not better than standard treatment/placebo. The alternative hypothesis is that the new treatment is better than standard treatment/placebo. If results of the trial are statistically significant, the null hypothesis can be rejected and the alternative hypothesis accepted. If the results are not statistically significant, the null hypothesis cannot be rejected. The Clopidogrel for High Atherothrombotic Risk and Ischemic Stabilization, Management and Avoidance (CHARISMA) trial5 hypothesised that aspirin plus clopidogrel was superior to aspirin in prevention of atherothrombotic events in stable cardiovascular disease. The trial showed a relative risk of 0.93 (95% CI 0.83 to 1.05, p=0.22). Therefore, the null hypothesis could not be rejected and the conclusion was aspirin and clopidogrel combination was not more effective than aspirin monotherapy.
Non-inferiority trials
More recently, there is an increasing trend with clinical trials demonstrating a new treatment is non-inferior to standard treatment. Non-inferiority trials are designed with the notion that the new treatment offers advantages over standard treatment in certain important aspects. The Rivaroxaban Once Daily Oral Direct Factor Xa Inhibitor Compared with Vitamin K Antagonism for Prevention of Stroke and Embolism Trial in Atrial Fibrillation (ROCKET AF) trial6 showed rivaroxaban was non-inferior to warfarin in stroke prevention in NVAF. The Randomized Evaluation of Long Term Anticoagulant Therapy (RE-LY) trial7 showed dabigatran was non-inferior to warfarin in prevention of stroke or systemic embolism in NVAF. In the Placement of Aortic Transcatheter Valves II (PARTNER II) trial,8 transcatheter aortic valve implantation (TAVI) was similar to surgical aortic valve replacement in intermediate-risk patients. In patients with severe mitral regurgitation, MitraClip was shown to be non-inferior to surgery in freedom from death, surgery for valve dysfunction or severe mitral regurgitation in A Study of the Evalve Cardiovascular Valve Repair System - Endovascular Valve Edge-to-Edge REpair STudy II (EVEREST II) trial.9 In Prospective randomized evaluation of the Watchman Left Atrial Appendage Closure device in patients with atrial fibrillation versus long-term warfarin therapy (PREVIAL) trial, Watchman device was non-inferior to warfarin in prevention of stroke or systemic embolism.10 There has been an increasing number of non-inferiority trials in cardiology, especially in the area of new devices or technologies. A search term of ‘non-inferiority’ or ‘non-inferior trials’ in PubMed in 2018 returned 53 publications in 1995 and 789 in 2016.
Table 1 shows a comparison of important aspects between superiority and non-inferiority trials. Despite these differences, it has been argued that the classification of trials as superiority or non-inferiority is arbitrary and somewhat artificial, especially with non-regulatory trials and when classifications of treatment groups as new or standard is ambiguous.11
Hypothesis testing in non-inferiority trials
The null hypothesis in non-inferiority trials is that new treatment is inferior to standard treatment. The alternative hypothesis is the new treatment is non-inferior to standard treatment. If the results of the trial are statistically significant, the null hypothesis can be rejected and the alternative hypothesis that the new treatment is non-inferior to standard treatment can be accepted.
Non-inferior margins
A non-inferior margin is specified in non-inferiority trials. The non-inferior margin is the predetermined margin of difference between the new and standard treatments. It represents how much worse the new treatment can be compared with standard treatment, yet still be considered ‘similar’ or ‘not worse’ than standard treatment. It is the maximal loss of efficacy of the standard treatment compared with the new treatment that can be tolerated in return for some perceived benefits of the new treatment. The non-inferior margin is defined relative to the established benefits of standard treatment over placebo. Figure 1 shows the hypothesis testing and possible outcomes of non-inferiority trials.
Are non-inferiority trials necessary?
Non-inferiority trials may be necessary. For conditions like atrial fibrillation (AF), with proven effective therapies, it is no longer be ethical to test any new treatment against placebo. Therefore, many non-inferior trials used active controls as the comparator. In NVAF, any new oral anticoagulants would need to be tested against warfarin as standard treatment for stroke prevention. Therefore, non-inferiority trial design with sequential testing would be suitable. Possible reasons for testing for non-inferiority for new treatment may be that the new treatment, while it may not superior, may be cheaper, more convenient or may have fewer side effects than standard treatment.
The premise of non-inferiority trials is that the new treatment being tested is better than standard treatment in some important dimension. While the advantages of the new treatment over standard treatment are clear in some cases (eg, TAVI trials), it may not be as obvious in others. This may be illustrated by non-inferiority trials comparing different stent platforms like the Thin composite wire strut, durable polymer-coated (ResoluteOnyx) versus ultrathin cobalt–chromium strut, bioresorbablepolymer-coated (Orsiro) drug-eluting stents in allcomerswith coronary artery disease (BIONYX) trial.12
Choosing the non-inferior margins
Two methods of choosing the non-inferior margin statistically can be used: relative risk difference and absolute risk difference.
Relative risk difference
A ratio of end point events on the new treatment to that on standard treatment is given as the non-inferior margin. The advantage of this method is that the event rate of the standard treatment does not need to be assumed. Non-inferiority is declared if the upper boundary of the 95% CI of the trial does not exceed that margin.
In trials of novel oral anticoagulants against warfarin in NVAF, the efficacy and confidence limits of warfarin over placebo were considered to determine the non-inferior margin. In general, the 50% rule is applied: at least 50% of the lower confidence limits (worst-case scenario) of the benefits of standard treatment over placebo is to be preserved. The choice of preserving 50% of the established treatment effects, although commonly practised, is empirical. In a meta-analysis of the six major trials of warfarin over placebo in non-valvular AF,13 warfarin resulted in 62% (95% CI 48% to 72%) relative risk reduction of stroke or systemic embolism. In the worst-case scenario, warfarin resulted in 48% relative risk reduction (risk ratio 0.52). The risk ratio of adverse events on placebo over warfarin is the reciprocal of 0.52, that is, 1.92. To preserve 50% of 1.92 on the linear scale gives a non-inferior margin of 1.46 (the excess risk of placebo over warfarin is 92%, half of that is 46%). On the log scale, preserving 50% of 1.92 is the square root of 1.92, that is, 1.38. In Apixaban for the Prevention of Stroke in Subjects With Atrial Fibrillation (ARISTOTLE),14 the non-inferior margin was 1.44 (or 1.38 on the log scale). In Rivaroxaban Once Daily Oral Direct Factor Xa Inhibitor Compared with Vitamin K Antagonism for Prevention of Stroke and Embolism Trial in Atrial Fibrillation (ROCKET AF), the non-inferior margin was 1.46.6
Absolute risk difference
Non-inferiority can be declared if the absolute difference in end points between the new and standard treatment is less than a predefined value. Unlike the relative risk method, this method entails an assumption on the event rate on standard treatment.
The actual event rate during the trial of the standard treatment is often lower than the assumed event rate (table 2). This will lead to a higher relative difference as the non-inferior margin and an underpowered trial, favouring non-inferiority. Table 2 shows the predefined non-inferior margins of several non-inferiority trials in cardiology with the assumed and observed event rates and the impact on the calculated relative risk difference.
Equivalence trials versus non-inferiority trials
In non-inferiority trials, investigators are interested in whether new treatment is non-inferior to standard treatment. Only the non-inferior margin to the right side of unity on the forest plot is specified. Therefore, the significance level is usually set as a one-sided p value of 0.025. In equivalence trials, investigators are interested in whether the new treatment is equivalent or ‘the same as’ standard treatment (figure 1). Therefore, margins on both sides of unity are specified. In equivalence trials, the significance level is set as a two-sided p value of 0.05. The COBALT trial hypothesised that double bolus TPA was equivalent to the accelerated regimen in STEMI.15 The prespecified upper limit of absolute difference of 0.4% in 30-day mortality was exceeded. Therefore, the double bolus regimen was not shown to be equivalent to accelerated TPA with respect to 30-day mortality. Equivalence trials are seldom the focus now as non-inferiority trials with sequential testing are preferred.
In sequential testing, tests for non-inferiority are performed first. If non-inferiority is established, the results are then tested for superiority. Sequential testing is possible only if the trial is designed as a non-inferiority trial with a prespecified non-inferior margin. Most non-inferiority trials now are tested for non-inferiority followed by testing for superiority. A to Z trial16 evaluated enoxaparin against unfractionated heparin with a non-inferior margin of 1.144. It showed a non-significant hazard ratio (HR) of 0.88 favouring enoxaparin (95% CI 0.71 to 1.08). The upper boundary of the 95% CI did not exceed 1.144. Therefore, enoxaparin was declared non-inferior but not superior to unfractionated heparin. Sequential testing was similarly performed in other non-inferior trials including ROCKET AF and RE-LY. Caution should be exercised in interpreting p values of sequential testing, especially when p values of superiority and non-inferiority testing are reported side by side.
Active versus placebo control
Most non-inferiority trials used active controls as the comparator. Placebo controls are seldom used in non-inferiority trials. However, placebo controls were used in some non-inferiority trials with different aims—examining safety of new treatments.
Non-inferiority design examining safety
TECOS trial compared sitagliptin with placebo in type 2 diabetes.17 The hypothesis was sitagliptin was non-inferior to placebo with respect to a primary combined cardiovascular outcome with a non-inferior margin of 1.3. The HR in TECOS trial was 0.98 in favour of sitagliptin (95% CI 0.89 to 1.08, p=0.68). The upper boundary of the 95% CI did not exceed 1.3. Therefore, sitagliptin was declared non-inferior but not superior to placebo. The purpose of this non-inferiority trial with placebo control was to demonstrate the cardiovascular safety of new hypoglycaemic medications. The Food and Drug Administration (FDA) has mandated the demonstration of cardiovascular safety of new hypoglycaemic agents after the increased risks of death and myocardial infarction shown with rosiglitazone.18 The non-inferior margin of 1.3 was stipulated by the FDA and was used in all such trials with the primary aim of demonstrating the cardiovascular safety of new hypoglycaemic medications. Liraglutide was compared with placebo in Liraglutide Effect and Action in Diabetes:Evaulation of Cardiovascular Outcome Results (LEADER) trial19 and empagliflozin was compared with placebo in the Empagliflozin Cardiovascular Outcome Event Trial in Type 2 Diabetes Mellitus Patients (EMPA-REG OUTCOME) trial,20 both with non-inferior margin of 1.3. Unlike the TECOS trial, liraglutide and empagliflozin were shown to be non-inferior and superior to placebo with respect to cardiovascular outcomes by sequential testing.
Assumptions of non-inferiority trials
Two inherent assumptions in non-inferiority trials are not necessarily tested in the trial and may not be correct. The first is that new treatment offers advantages over standard treatment. The new treatment may be cheaper, more convenient or easier to administer, more readily available, less invasive or may have fewer side effects. These properties are usually not tested in the trial.
The second assumption is that the new treatment is superior to placebo. This assumption is not tested unless a placebo arm is included. While a placebo control may be used in non-inferiority trials, the purpose of having placebo control is usually not to demonstrate the superior therapeutic effects of the new treatment but to demonstrate its safety. While the new treatment may be demonstrated to be non-inferior to the standard treatment in the trial, this does not necessarily mean it is superior to placebo. This is particularly the case if the standard treatment has only a modest therapeutic advantage over placebo and/or a wide, unjustified non-inferior margin is chosen.
Risks of successive non-inferiority trials
The risks of successive non-inferiority trials are unique to their design. If a new treatment is shown to be non-inferior to standard treatment, all it means is that the new treatment is not worse than the standard treatment by the predetermined non-inferior margin. The new treatment may subsequently replace the old standard treatment and be accepted as the new standard treatment.
If a newer treatment becomes available in the future, that newer treatment is likely to be compared with the new ‘standard’ treatment and not the old standard treatment in further non-inferiority trials. While the newer treatment may be non-inferior to the new standard treatment, it may no longer be non-inferior to the original standard treatment. In this way, less and less effective treatments may be introduced in successive non-inferiority trials.
Intention-to-treat analyses
Intention-to-treat analyses may favour non-inferiority results. Protocol violations may dilute any potential differences between the treatment arms, favouring non-inferiority results. Therefore, it is preferable for non-inferior trials to be analysed according to both the intention-to-treat and per-protocol approach. Non-inferiority may be declared only if both analyses support the same conclusion. ROCKET AF and RE-LY trials provided both intention-to-treat and per-protocol analyses while PARTNER III and Evolut low-risk trial21 reported per-protocol analysis.
Impact of the quality of the non-inferiority trials on results
The quality of the design and execution of clinical trials have a significant impact on the results. Having adequate statistical power, well-defined and high-quality end points, adequate follow-up duration, low loss-to-follow-up and low protocol-violation rates are critical for the success of superiority trials.
Low-quality trials, however, may favour non-inferior results. Poorly defined or soft end points, inadequate statistical power and follow-up duration, high protocol violation and loss-to-follow-up may dilute any potential differences between the new and standard treatment and therefore favour non-inferiority. Carefully choosing the study population is of critical importance in any clinical trial. It is perhaps more important in non-inferiority trials. Choosing low-risk population in testing for superiority of a treatment over placebo will likely lead to non-significant results. In contrast, having low-risk population in non-inferiority trials may favour non-inferiority. Furthermore, having more stringent criteria for statistical significance, that is, one-sided p value of 0.025 is considered desirable for non-inferior trials compared with the usual α of 0.05. In this respect, practices varied. Trials like ROCKET AF, ARISTOLE, RE-LY, LEADER and EMPA-REG OUTCOME trials used α of one-sided p value of 0.025, whereas EVEREST II and PARTNER IA trial used α of one-sided p value of 0.05.
The inclusion of soft end points leads to a higher event rate and therefore allows a more generous non-inferior margin to be chosen. A higher non-inferior margin can be chosen by the relative risk difference method as it is easier to preserve half of the benefits of the standard treatment over placebo due to the higher event rate. Furthermore, assuming a higher event rate may favour non-inferiority.
Reporting of non-inferiority trials
Inadequacies were common in the reporting of non-inferiority trials in the literature. Common deficiencies include inappropriate use of non-inferiority design, inappropriate claims of non-inferiority or equivalence with non-significant results in superiority trials, failure to provide rationale for the non-inferiority design or non-inferior margins.
In a review of 88 equivalence trials from 1992 to 1996, equivalence was inappropriately declared from non-significant results of superiority testing in two-thirds.22 The GUSTO III trial compared reteplase with accelerated TPA in STEMI.23 The trial was designed as a superiority trial to detect a 20% relative risk reduction in mortality with reteplase. The trial showed a non-significant increase in mortality of 0.23% (95% CI −1.11% to 0.66%) with reteplase. The trial concluded that reteplase and accelerated TPA were similar in their effects on 30-day mortality. Equivalence in this case was inappropriately claimed from a non-significant result in what was designed as a superiority trial.
Many non-inferiority trials failed to report a predetermined non-inferior margin.22 Even when a non-inferior margin was specified, less than half reported how the margin was determined.24 25 Of the 232 non-inferiority trials, only 17 (7.3%) took into account preserving effects of standard treatment, with 15 applying the 50% rule.24 The trial results were analysed according to the intention-to-treat principle in 34.9% and per-protocol analysis in 19.4%. Only 41.8% of the trials analysed results by both methods.25
The Consolidated Standards of Reporting Trials (CONSORT) group provided an extension of the 2010 CONSORT statements in 2012 for reporting non-inferiority and equivalence trials.26 The extension of the checklist includes identification of the trial as an non-inferiority randomised trial in the title, providing the rationale for the non-inferior design, specifying the non-inferior margin and its rationale and specifying the non-inferiority outcomes.
Non-inferiority trials: what the clinician needs to know
The interpretation of non-inferiority trials is more complex than that of superiority trials. Correct interpretation involves carefully examining the design, the execution as well as the analysis and results of the trials. Is the non-inferiority trial design appropriate? What is the primary non-inferior hypothesis? What is the non-inferior margin and how is it chosen? These must be stated in all non-inferiority trials. The non-inferior margin must be specified with the rationale clearly specified. Is the study population representative of the condition being studied? An important aspect is to look at the event rate of the standard treatment, which should be comparable to historical results in previous superiority trials of the standard treatment over placebo. This will allow the reader to judge if suitably at-risk patients are recruited in the non-inferiority trial. It is also important to examine the quality of the execution of the trial. Poor quality trials favour non-inferior results. Non-inferiority trials should be analysed according to both intention-to-treat and per-protocol approaches and the results of both analyses should be in concordance. The point estimate and the CIs of the non-inferior trial should be noted as well, not just the p values showing non-inferiority. This will allow the clinician to assess how close the results of the study are to the non-inferior margin.
The use of non-inferiority trials is likely to increase. Clinicians should not accept the term ‘non-inferior’ at face value. It is important to be aware that ‘non-inferior’ does not mean ‘equivalent’ or ‘similar’ or ‘as good as’. Non-inferiority means the new treatment is not unacceptably worse than standard treatment.
References
Footnotes
Correction notice Since this article was first published online, figure 1 has been replaced. Further details on the changes can be found in the eletters section of the paper. These eletters can be found in the responses tab of the online article.
Contributors All authors have contributed to the writing, editing, revision and final approval of the manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.