Review Article
Validation, updating and impact of clinical prediction rules: A review

https://doi.org/10.1016/j.jclinepi.2008.04.008Get rights and content

Abstract

Objective

To provide an overview of the research steps that need to follow the development of diagnostic or prognostic prediction rules. These steps include validity assessment, updating (if necessary), and impact assessment of clinical prediction rules.

Study Design and Setting

Narrative review covering methodological and empirical prediction studies from primary and secondary care.

Results

In general, three types of validation of previously developed prediction rules can be distinguished: temporal, geographical, and domain validations. In case of poor validation, the validation data can be used to update or adjust the previously developed prediction rule to the new circumstances. These update methods differ in extensiveness, with the easiest method a change in model intercept to the outcome occurrence at hand. Prediction rules—with or without updating—showing good performance in (various) validation studies may subsequently be subjected to an impact study, to demonstrate whether they change physicians' decisions, improve clinically relevant process parameters, patient outcome, or reduce costs. Finally, whether a prediction rule is implemented successfully in clinical practice depends on several potential barriers to the use of the rule.

Conclusion

The development of a diagnostic or prognostic prediction rule is just a first step. We reviewed important aspects of the subsequent steps in prediction research.

Introduction

Prediction rules or prediction models, often also referred to as decision rules or risk scores, combine multiple predictors, such as patient characteristics, test results, and other disease characteristics, to estimate the probability that a certain outcome is present (diagnosis) in an individual or will occur (prognosis). They intend to aid the physician in making medical decisions and in informing patients. Table 1 shows an example of a prediction rule.

In multivariable prediction research, the literature often distinguishes three phases: (1) development of the prediction rule; (2) external validation of the prediction rule (further referred to as “validation”), that is, testing the rule's accuracy and thus generalizability in data that was not used for the development of the rule, and subsequent updating if validity is disappointing; and (3) studying the clinical impact of a rule on physician's behavior and patient outcome (Table 2) [1], [2], [3], [4], [5]. A fourth phase of prediction research may be the actual implementation in daily practice of prediction rules, which endured the first three phases [4]. A quick Medline-search using a suggested search strategy [6] demonstrated that the number of scientific articles discussing prediction rules has more than doubled in the last decade; 6,744 published articles in 1995 compared to 15,662 in 2005. A striking fact is that this mainly includes papers concerning the development of prediction rules. A relatively small number regards the validation of rules and there are hardly any publications showing whether an implemented rule has impact on physician's behavior or patient outcome [3], [4].

Lack of validation and impact studies is unfortunate, because accurate predictions—commonly expressed in good calibration (agreement between predicted probabilities and observed outcome frequencies) and good discrimination (ability to distinguish between patients with and without the outcome)—in the patients that were used to develop a rule are no guarantee for good predictions in new patients, let alone for their use by physicians [1], [3], [4], [7], [8]. In fact, most prediction rules commonly show a reduced accuracy when validated in new patients [1], [3], [4], [7], [8]. There may be two main reasons for this: (1) the rule was inadequately developed and (2) there were (major) differences between the derivation and validation population.

Many guidelines regarding the development of prediction rules have been published, including the number of potential predictors in relation to the number of patients, methods for predictor selection, how to assign the weights per predictor, how to shrink the regression coefficients to prevent overfitting, and how to estimate the rule's potential for optimism using so-called internal validation techniques such as bootstrapping [1], [2], [7], [8], [9], [10], [11], [12], [13], [14].

Compared to the literature on the development of prediction rules, the methodology for validation and studying the impact of prediction rules is underappreciated [1], [4], [8]. This paper provides a short overview of the types of validation studies, of possible methods to improve or update a previously developed rule in case of disappointing accuracy in a validation study, and of important aspects of impact studies and implementation of prediction rules. We focus on prediction rules developed by logistic regression analysis, but the issues largely apply to prediction rules developed by other methods such as Cox proportional hazard analysis or neural networks. The methodology applies both to diagnostic and prognostic prediction rules and is illustrated with examples from diagnostic and prognostic research.

Section snippets

Examples of disappointing accuracy of prediction rules

Even when internal validation techniques are applied to correct for overfitting and optimism, the accuracy of prediction rules can be substantially lower in new patients compared to the accuracy found in the patients of the development population. For example, the generalizability of an internally validated prediction rule for diagnosing a serious bacterial infection in children presenting with fever without apparent source was disappointing [15]. In the development study, the area under the

Updating prediction rules

When a validation study shows disappointing results, researchers are often tempted to reject the rule and directly pursue to develop new rules with the data of the validation population only. However, although the original prediction rules usually have been developed with large data sets, validation studies are frequently conducted with much smaller patient samples. The redeveloped rules are thus also based on smaller samples. Furthermore, it would lead to many prediction rules for the same

Impact analysis

To ascertain whether a validated diagnostic or prognostic prediction rule will actually be used by physicians, will change or direct physicians' decisions, and will improve clinically relevant process parameters (such as number of bed days, length of hospital stay, or time to diagnosis), patient outcomes, or reduces costs, an impact study or impact analysis should be performed [3], [4]. In the ideal design of an impact study, physicians or care units are randomized to either the index

Implementation of prediction rules

When a rule has frequently been proven to be accurate in diverse populations, the more likely it is that the prediction rule can be successfully applied in practice [1], [4], [8]. Yet, there are still reasons why the rule is not as successful in daily practice.

First, physicians may feel that their often implicit estimation of a particular predicted probability is at least as good as the probability calculated with a prediction rule, and may therefore not use or follow the rule's predictions [3]

Final comments

We have given an overview of types of validation studies, of methods to improve or update a previously developed diagnostic or prognostic prediction rule in case of disappointing accuracy in a validation study, and of aspects of impact studies and the implementation of prediction rules. A validated, and if necessary updated, rule may cautiously be applied in new patients that are similar to the patients in the development and validation populations. However, when the user has reasons to believe

Acknowledgments

We gratefully acknowledge the support by The Netherlands Organization for Scientific Research (ZonMw 016.046.360; ZonMw 945-04-009).

References (60)

  • K.S. Khan et al.

    Systematic reviews with individual patient data meta-analysis to evaluate diagnostic tests

    Eur J Obstet Gynecol Reprod Biol

    (2003)
  • D.G. Altman et al.

    What do we mean by validating a prognostic model?

    Stat Med

    (2000)
  • A. Laupacis et al.

    Clinical prediction rules. A review and suggested modifications of methodological standards

    JAMA

    (1997)
  • T.G. McGinn et al.

    Users' guides to the medical literature: XXII: how to use articles about clinical decision rules. Evidence-Based Medicine Working Group

    JAMA

    (2000)
  • B.M. Reilly et al.

    Translating clinical research into clinical practice: impact of using prediction rules to make decisions

    Ann Intern Med

    (2006)
  • J.H. Wasson et al.

    Clinical prediction rules. Applications and methodological standards

    N Engl J Med

    (1985)
  • B.J. Ingui et al.

    Searching for clinical prediction rules in MEDLINE

    J Am Med Inform Assoc

    (2001)
  • F.E. Harrell et al.

    Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors

    Stat Med

    (1996)
  • A.C. Justice et al.

    Assessing the generalizability of prognostic information

    Ann Intern Med

    (1999)
  • J.B. Copas

    Regression, prediction and shrinkage

    JR Stat Soc B

    (1983)
  • B. Efron et al.

    A leisurely look at the bootstrap, the jackknife, and cross-validation

    Am Stat

    (1983)
  • F.E. Harrell

    Regression modelling strategies with applications to linear models, logistic regression, and survival analysis

    (2001)
  • E.W. Steyerberg et al.

    Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets

    Stat Med

    (2000)
  • S.A. Nashef et al.

    European system for cardiac operative risk evaluation (EuroSCORE)

    Eur J Cardiothorac Surg

    (1999)
  • F. Roques et al.

    Risk factors and outcome in European cardiac surgery: analysis of the EuroSCORE multinational database of 19030 patients

    Eur J Cardiothorac Surg

    (1999)
  • H.J. Geissler et al.

    Risk stratification in heart surgery: comparison of six score systems

    Eur J Cardiothorac Surg

    (2000)
  • A. Gogbashian et al.

    EuroSCORE: a systematic review of international performance

    Eur J Cardiothorac Surg

    (2004)
  • Y. Kawachi et al.

    Risk stratification analysis of operative mortality in heart and thoracic aorta surgery: comparison between Parsonnet and EuroSCORE additive model

    Eur J Cardiothorac Surg

    (2001)
  • P. Michel et al.

    Logistic or additive EuroSCORE for high-risk patients?

    Eur J Cardiothorac Surg

    (2003)
  • S.A. Nashef et al.

    Validation of European System for Cardiac Operative Risk Evaluation (EuroSCORE) in North American cardiac surgery

    Eur J Cardiothorac Surg

    (2002)
  • Cited by (0)

    View full text