Objective To evaluate informal physician judgement versus pretest probability scores in estimating risk in patients with suspected coronary artery disease (CAD).
Methods We included 4533 patients from the PROMISE (Prospective Multicenter Imaging Study for Evaluation of Chest Pain) trial. Physicians categorised a priori the pretest probability of obstructive CAD (≥70% or ≥50% left main); Diamond-Forrester (D-F) and European Society of Cardiology (ESC) pretest probability estimates were calculated. Agreement was calculated using the κ statistic; logistic regression evaluated estimates of pretest CAD probability and actual CAD (as determined by CT coronary angiography), and clinical outcomes were modelled using Cox proportional hazard models.
Results Physician estimates agreed poorly with D-F (κ 0.16; 95% CI 0.14 to 0.18) and ESC (κ 0.04; 95% CI 0.02 to 0.05). Actual obstructive CAD was significantly more prevalent in both the high-likelihood (OR 3.30; 95% CI 2.30 to 4.74) and the intermediate-likelihood (OR 1.43; 95% CI 1.16 to 1.76) physician-estimated groups versus the low-likelihood group; ESC similarly differentiated between the three groups (OR 9.07; 95% CI 2.87 to 28.70; and OR 3.87; 95% CI 1.22 to 12.28). However, using D-F, only the high-probability group differed (OR 2.49; 95% CI 1.74 to 3.54). Only physician estimates were associated with a higher incidence of adjusted death/myocardial infarction/unstable angina hospitalisation in the high-probability versus low-probability group (HR 2.68; 95% CI 1.52 to 4.74); neither pretest probability score provided prognostic information.
Conclusions Compared with D-F and ESC estimates, physician judgement more accurately identified obstructive CAD and worse patient outcomes. Integrating physician judgement may improve risk prediction for patients with stable chest pain.
Trial registration number NCT01174550.
- chest pain
- diagnostic imaging
- outcome assessment
Data availability statement
Data are available upon reasonable request.
Statistics from Altmetric.com
When evaluating patients with stable chest pain in clinical practice, most physicians formulate estimates of patient status to aid in decision-making, including the probability of obstructive coronary artery disease (CAD) for potential revascularisation and overall long-term risk of adverse events. These characteristics have been quantified by the Diamond-Forrester (D-F) risk score,1 which has been further validated in the Coronary Artery Surgery Study (CASS) registry and is used in the current US guidelines.2 However, recent studies suggest that some contemporary risk scores overestimate actual risk.3 4 To this end, the European Society of Cardiology (ESC) has proposed a revised pretest probability score in its 2019 guidelines.4
Physician estimation is a possible substitute, but little is known about how physician estimation compares to a formal risk score, as well as the relationships between physician-generated and risk-score-generated estimates and major adverse cardiovascular events. Some groups have evaluated the relationship between either race or sex and the treating physician’s pretest estimation of obstructive CAD, but they are limited to older, single-centre registry-based studies.5 6 To our knowledge, no study has systematically and prospectively assessed physicians’ estimate of the pretest probability of CAD in comparison to either formal risk score as well as the relationship between these estimates and long-term outcomes.
As part of standard baseline clinical data collection, site investigators in the Prospective Multicenter Imaging Study for Evaluation of Chest Pain (PROMISE) trial were asked to estimate the probability of obstructive disease.7 Formal risk score calculations for each subject were made using D-F/CASS,2 and this was calculated for each trial participant but not provided to physicians. This was the most updated score available at the time of the study and is still the score recommended by the current US guidelines. ESC pretest probability (ESC-PTP) estimates were calculated after it became available in 2019.4
Understanding whether physician estimation differs and/or provides incremental predictive value over and above traditional risk scores could provide valuable insights into future risk algorithms. The objectives of the current analysis were to (1) determine the agreement between physician and D-F and ESC-PTP estimates of pretest probability in PROMISE, (2) report the frequency of actual obstructive CAD found on coronary computed tomographic angiography (CCTA) or cardiac catheterisation by physician versus D-F and ESC-PTP estimates, and (3) describe the association of physician versus D-F and ESC-PTP estimates with the clinical outcomes of death, myocardial infarction and unstable angina hospitalisation over a median 25 months of follow-up.
PROMISE datasets are available at https://biolinccnhlbinihgov/studies/promise. There are no commercial use data restrictions, and no data restrictions based on area of research.
Study population and design
PROMISE was a pragmatic comparative effectiveness trial that enrolled 10 003 patients at 193 sites in North America representing both community practices and academic medical centres. The PROMISE study design and primary results have been described in detail.7 8 The study enrolled stable symptomatic outpatients without known CAD referred to non-invasive testing for further evaluation, who were randomised to an initial anatomical testing strategy with CCTA or a functional testing strategy (exercise treadmill testing, stress echocardiography or stress nuclear imaging), and who were then followed up for a median of 25 months for outcome events. The local or central institutional review board at each coordinating centre and at each of the 193 enrolling sites in North America approved the study protocol. All participants provided written informed consent.
The analytical cohort included 4533 patients with stable chest pain or equivalent who underwent CCTA in the PROMISE trial (figure 1). The attending physician categorised a priori (prior to non-invasive testing) the pretest probability of obstructive CAD (≥70% stenosis of major epicardial artery or ≥50% left main artery) for each patient according to five categories: very low (<10%), low (10%–30%), intermediate (31%–70%), high (71%–90%) or very high (>90%) according to his or her clinical judgement. Categorisation was left at the discretion of the physician. For the purposes of this secondary analysis, very low and low were categorised as low, and high and very high were categorised as high. D-F estimates were categorised as low (<30%), intermediate (30% to <70%) or high (≥70%), and pretest probability was calculated based on age, sex and chest pain typicality as previously reported.7 The D-F probability cut-offs were chosen to remain consistent with the physician-categorised pretest probabilities listed above. While D-F was the contemporaneous and therefore appropriate comparator for this analysis at the time of PROMISE enrolment, cut-offs were also applied using the 2019 ESC-PTP estimates of <5% for low, 5%–15% for indeterminate and >15% for high pretest probability.4 The ESC-PTP estimate of 5%–15% is considered indeterminate (vs intermediate) since the guidelines advocate that non-invasive testing be considered after assessing the overall clinical likelihood based on clinical modifiers. After exclusions (figure 1), the final study cohort consisted of 4533 patients.
Patient and public involvement
Patients or the public were not involved in the design, or conduct, or reporting or dissemination plans of our research.
Continuous baseline characteristics were summarised using means and SD or medians with 25th and 75th percentiles. Categorical variables were summarised using frequencies and percentages. Group comparisons with respect to continuous baseline variables were performed using the Wilcoxon rank sum test; Pearson’s χ2 or Fisher’s exact test was used for comparisons involving categorical variables. Agreement was calculated using the κ statistic. Logistic regression models were used to identify the association between physician, D-F or ESC-PTP estimates with CAD prevalence. Cox proportional hazard models were used to calculate adjusted hazard ratios between physician, D-F, or ESC-PTP estimates and clinical outcomes of death, myocardial infarction and unstable angina hospitalisation over a median 25 months of follow-up. The proportional hazard assumption was assessed and met. We adjusted a priori for race; body mass index (BMI); hypertension; metabolic syndrome; dyslipidaemia; history of carotid, peripheral vascular or cerebrovascular disease; smoking (ever vs never); family history of premature CAD; depression; physical activity; and CAD. Age, sex and chest pain typicality were not included in the model since they are part of the D-F classification. Stepwise selection with conservative entry and exit criteria (entry criterion: p value <0.1; exit criterion: p value >0.2) was used to select the ‘best’ subset of predictors of test positivity. The Hosmer-Lemeshow goodness-of-fit test was used to assess the calibration of the final model, and the area under a receiver operating characteristic curve was used to assess the final model’s discriminatory capacity. Statistical significance was set at α=0.05. All analyses were performed in SAS V.9.4 (SAS Institute).
Study population and cohorts
The majority of healthcare providers participating in the PROMISE trial were cardiologists (86.9%), followed by internal medicine specialists (5.3%), physician assistant/nurse practitioners (3.7%) and family medicine physicians (1.3%).
Among the 4533 patients included in the analyses, physicians categorised 209 (4.6%) as having a high probability, 2630 (58.0%) as having an intermediate probability and 1694 (37.4%) as having a low probability of obstructive CAD. In contrast, D-F categorised 1197 (26.4%) patients as having a high probability, 2854 (63.0%) as having an intermediate probability and 482 (10.6%) as having a low probability of obstructive CAD; ESC-PTP categorised 2115 (46.7%) patients as having a high probability, 2275 (50.2%) as having an indeterminate probability and 143 (3.2%) as having a low probability. Baseline clinical characteristics stratified by physician, D-F and ESC-PTP estimates of obstructive CAD are shown in table 1. As shown, there were several significant differences among cardiac risk factors and symptoms between physician estimates, D-F and ESC-PTP in the high/intermediate or indeterminate/low groups.
Agreement of physician estimates versus D-F or ESC-PTP for the presence of obstructive CAD
The agreement rate with respect to pretest probability of obstructive CAD was poor between both physician and D-F estimates (51.2% (2322/4533; κ, 0.16; 95% CI 0.14 to 0.18)) as well as between physician and ESC-PTP estimates (34.9% (1580/4533; κ, 0.04; 95% CI 0.02 to 0.05)) (online supplemental table 1). Physicians generally estimated patients as having lower probability of having obstructive CAD compared with D-F or ESC-PTP (figure 2). For example, very few patients were felt to be more likely to have obstructive CAD compared with D-F (2.7%; 122/4533) or ESC-PTP (1.3%; 61/4533).
Prevalence of observed obstructive CAD by physician estimates versus D-F or ESC-PTP
Among physician-estimated groups, obstructive CAD was most commonly found in the high-probability group (27.3%) compared with the intermediate (12.6%) or low (8.7%) groups (table 2). Among D-F estimates, obstructive CAD was also most commonly found in the high-probability group (20.0%), but with similar rates of obstructive CAD in the intermediate (8.9%) or low (8.9%) groups. Among ESC-PTP estimates, obstructive CAD was also most commonly found in the high-probability group (16.6%), with lower rates of obstructive CAD in the indeterminate (8.0%) or low (2.1%) groups. In a multivariable model adjusting for important baseline characteristics, the physician-estimated high-risk and intermediate-risk groups were significantly more likely (OR 3.30; 95% CI 2.30 to 4.74; and OR 1.43; 95% CI 1.16 to 1.76, respectively) (figure 3 and online supplemental table 2) to have actual obstructive CAD on CCTA compared with the low-probability group. When compared with the low-probability group, groups estimated by D-F as having high probability were significantly more likely to have actual obstructive CAD (OR 2.49; 95% CI 1.74 to 3.54), but not the intermediate group; when estimated by ESC-PTP, both the high-probability and indeterminate-probability groups were significantly more likely to have actual obstructive CAD (OR 9.07; 95% CI 2.87 to 28.70; and OR 3.87; 95% CI 1.22 to 12.28, respectively).
Clinical outcomes according to physician versus D-F or ESC-PTP estimates of obstructive CAD
Among physician-estimated groups, the combined outcome of all-cause death/myocardial infarction/unstable angina hospitalisation occurred most frequently in the high-probability group (8.1%) compared with the intermediate (2.8%) or low (2.7%) groups over a median of 25 months of follow-up (online supplemental table 3). After adjustment, patients in the high-probability group (HR 2.68; 95% CI 1.52 to 4.74) were more likely to experience the combined outcome compared with the low-probability group. However, there was no difference between the intermediate-probability and low-probability groups (figure 4A and online supplemental table 4). Among D-F estimates, the combined endpoint also occurred most frequently in the high-probability group (4.1%) compared with the intermediate (2.5%) and low (3.3%) groups; however, there were no significant differences following adjustment between either the high-probability or intermediate-probability groups (figure 4B and online supplemental table 4) compared with the low-probability group. Among ESC-PTP estimates, the combined endpoint also occurred most frequently in the high-probability group (4.1%) compared with the indeterminate (2.0%) and low (3.5%) groups; however, there were also no significant differences following adjustment between either the high-probability or indeterminate-probability groups compared with the low-probability group (figure 4C and online supplemental table 4). Similar findings were observed for any of the estimates for the combined endpoints of cardiovascular death/myocardial infarction/unstable angina.
While pretest probability scores have historically been used to guide risk stratification and clinical assessment of patients with suspected CAD, this study supports recent guideline recommendations that see a limited role for formal pretest probability scoring in the clinical assessment of patients with suspected CAD. We found that both the physician and the D-F and ESC-PTP estimates were able to stratify the probability of obstructive CAD in patients with stable chest pain, although agreement between physician and either the D-F or ESC-PTP algorithms in predicting the presence of obstructive CAD was poor. Patients considered higher risk by physicians or ESC-PTP were significantly more likely to have obstructive CAD (high and indeterminate risk) and be at risk for future events (high risk) compared with patients considered low risk. In contrast, only high-risk patients per D-F were significantly more likely to have obstructive CAD compared with low-risk patients. While physician estimates of high-risk patients were associated with worse outcomes, there was with no significant association for future events for D-F or ESC-PTP at any risk level. Taken together, this study demonstrates that clinical judgement may be a better determinant of risk than a single pretest probability risk score.
Clinical judgement is a central element of the medical profession and essential for physician performance, but relatively little is known about the relationship between clinical judgement and subsequent patient management.9 Physician judgement has been evaluated previously in the diagnosis of such conditions as pulmonary embolism and pneumonia.10 11 In the evaluation of acute coronary syndrome, physician experience may lead to a more accurate diagnosis, but data are limited to smaller, single-centre studies.12 13 Although the relationship between either race or sex and the treating physician’s pretest estimation of obstructive CAD has been explored in stable chest pain, physician estimates were not captured systematically.5 6 Concurrently, mounting evidence has found that the large majority of functional stress tests on outpatients with a clinical syndrome of possible ischaemia are normal, and very few of these patients will experience an untoward clinical event.2–5 7 As a result, there is substantial interest in developing strategies to identify both the lowest risk patients who may not require testing14–16 and those with the highest risk who should potentially proceed directly to cardiac catheterisation.17 Understanding the role of physician estimates of obstructive disease relative to traditional risk scores is clinically appealing and could identify those patients most likely to benefit from testing.
We found that in nearly half of patients with stable chest pain, physicians typically felt that obstructive disease was less likely than D-F suggested. This extends contemporary thinking that D-F tends to overestimate both the degree of obstructive disease and/or subsequent event rates.3 4 18 We further found that clinical judgement or ESC-PTP estimates were able to discriminate obstructive CAD likelihood in high-risk and intermediate/indeterminate-risk groups compared with the low-risk group, while only the high-risk group per D-F estimate was associated with a higher prevalence of obstructive CAD. The highest proportion of obstructive CAD was found in the physician high-risk group (27.3%), while the lowest proportion of obstructive disease was in the ESC-PTP low-risk group (2.1%), suggesting that clinical judgement may be most useful in reclassifying higher risk patients, while ESC-PTP may be most beneficial at identifying those at lowest risk. In addition to better predicting CAD, the physician-determined high-risk subgroup was associated with a significantly greater risk of major adverse events, including all-cause death/myocardial infarction/unstable angina hospitalisation, while none of the risk groups as determined by D-F or ESC-PTP were associated with events, although we acknowledge that the latter scores were not designed to specifically predict outcomes.
Compared with D-F or ESC-PTP, physician estimates are not fixed or singular, but a ‘mixture’ (ie, varying by age, sex, training, experiences) of different physician estimates that could produce different levels of predictive performance. However, our results suggest that even with a mixture of physician estimates, clinical judgement better delineates both disease prevalence and adverse event risk than a standard risk score. Similar findings have been previously described in acute chest pain where unstructured clinical impression performed better than some contemporary risk scores in excluding myocardial infarction.19 We speculate that this is because physicians likely integrate multiple factors into risk assessment in addition to traditional demographics, risk factors, signs and symptoms, including severity of risk factors and symptoms rather than merely their presence, as well as subjective impressions that may be difficult to quantify.
This analysis provides several important and novel insights. First, to our knowledge, this is the first systematic evaluation of clinical judgement in the a priori evaluation of stable chest pain among a large heterogeneous group of patients with stable chest pain across multiple sites, making the results highly generalisable. Incorporating clinical judgement may be an optimal strategy to determine high risk in this population, a strategy employed routinely in the evaluation of possible pulmonary embolism.10 Second, these results extend recent studies emphasising clinical ‘gestalt’ in the current National Institute for Health and Care Excellence (NICE) guidelines to perform non-invasive testing solely by physician judgement of the typicality of chest pain, and to eschew traditional risk factors such as age and sex used in other guidelines, including the recent ESC guidelines.4 20 Third, in an era of electronic medical records and protocolised healthcare, the results of this analysis bolster the importance of clinical thinking and patient-centred care in determining risk for cardiovascular medicine despite advances in technology. Fourth, since this analysis demonstrates the superiority of physician judgement over a traditional risk score in the assessment of chest pain, it provides further impetus to develop strategies to improve clinical judgement among trainees.21 Finally, although the D-F score has served the medical community well for decades, this work extends prior evidence that the D-F does not provide any incremental benefit during the contemporary evaluation of chest pain, suggesting it may be avoided in future research/clinical work,22–24 and, for the first time, documents that the contemporary ESC-PTP score similarly adds limited benefit over physician estimates.
This study does have some limitations. First, our cohort was relatively low risk for both obstructive CAD and events. It is unclear how these results might apply to a higher risk group with a greater prevalence of disease or incidence of events (including the original D-F derivation cohort), or those who did not consent to participate in a clinical trial. However, the results are generated from the largest, contemporary real-world evaluation of the role of non-invasive testing among patients with stable chest pain, which strengthens its generalisability. Second, in this analysis, the vast majority of physicians were cardiologists, followed by much smaller proportions of internal medicine specialists and other providers, making analysis by specialty difficult. It would be further informative to understand the relationship between other physician characteristics (ie, age, sex and years in practice) and outcomes, but these data were not captured in PROMISE. Previous work in chronic coronary disease has not definitively demonstrated a clear link between clinical experience and clinical outcomes.25 Notwithstanding, given the number of sites and physicians involved, our results would be expected to be consistent across a range of physician characteristics and experiences. Third, the diagnosis of CAD in our study was obtained via CCTA. While acceptance of CCTA as a first-line modality among chest pain patients with low to intermediate pretest probability of obstructive CAD is growing, invasive coronary angiography is still considered the gold standard. Nevertheless, the most recent ESC guidelines4 based pretest probability on a pooled analysis3 that includes CCTA data, including an analysis from the PROMISE trial.26 Finally, the relationship between physician estimates and various risk scores or PTP cut-offs used in this analysis may differ, particularly given recent recognition that D-F overestimates the actual presence of obstructive CAD.3 4 26 While D-F was the contemporaneous comparator for this analysis, we strengthened this analysis by comparing physician estimates to ESC-PTP. Compared with the D-F intermediate PTP group, ESC-PTP estimates better delineated the presence of obstructive disease among patients at indeterminate pretest probability.
Physician, D-F and ESC-PTP estimates were able to stratify the probability of obstructive CAD in patients with stable chest pain; however, agreement between physician estimates and either score algorithm was poor. Compared with D-F score, physician judgement and ESC-PTP more accurately identified obstructive CAD. However, only physician judgement was able to stratify patient adverse outcomes. These results support the development of improved approaches to predict CAD prevalence and risk among patients with stable chest pain, including formal integration of physician judgement and risk estimation.
What is already known on this subject?
Pretest probability scores have historically been used to guide risk stratification and clinical assessment of patients with suspected coronary artery disease (CAD).
Informal physician judgement is also used to estimate risk in patients with stable chest pain with suspected CAD, but it is unclear how judgement alone compares to traditional risk scores such as Diamond-Forrester (D-F) or European Society of Cardiology (ESC) pretest estimates.
What might this study add?
Compared with D-F estimates, physician judgement and ESC more accurately identified patients with stable chest pain who had obstructive CAD.
However, only physician judgement (and not ESC or D-F) predicted worse patient outcomes in patients with the highest probability of CAD.
Clinical judgement may be a better determinant of risk than a single risk score.
How might this impact on clinical practice?
These results support efforts to integrate physician judgement to improve risk prediction among patients with stable chest pain, and support recent guideline recommendations that see a limited role for formal pretest probability scoring in the clinical assessment of patients with suspected CAD.
Data availability statement
Data are available upon reasonable request.
Patient consent for publication
This study involves human participants and was approved by the local or central institutional review board at each coordinating center and at each of the 193 enrolling sites in North America. Participants gave informed consent to participate in the study before taking part.
We thank Adrian Coles, PhD, for his statistical assistance, and Peter Hoffmann for his editorial contributions to this manuscript. Dr Coles and Mr Hoffmann did not receive compensation for their assistance, apart from their employment at the Duke Clinical Research Institute.
Twitter @pattypellikka, @pamelasdouglas
Contributors CBF acts as guarantor and takes responsibility for the overall content of the manuscript. PD was involved in the design of the analysis and provided critical review. CLH and BA provided statistical analysis and support. DM, PAP, UH and MRP critically reviewed the manuscript.
Funding The PROMISE trial was funded by the National Heart, Lung, and Blood Institute grants R01 HL098237, R01 HL098236, R01 HL098305 and R01 HL098235.
Disclaimer The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. The views expressed in this article do not necessarily represent the official views of the National Heart, Lung, and Blood Institute.
Competing interests CBF: Consulting fees/honoraria from Bayer, Novo Nordisk, Sanofi, Boehringer Ingelheim, Pfizer; research support from Bayer; Steering Committee service for HeartFlow. DM: Consultant fees/honoraria from Medtronic; research support from AGA Medical, AstraZeneca, Bayer Healthcare Pharmaceuticals, BMS, Eli Lilly, Gilead, Merck & Co., Inc. UH: Research support from HeartFlow. MRP: Consultant fees/honoraria from Bayer Healthcare, Genzyme, Medscape - theheart.org, Merck; research support from AHRQ, AstraZeneca, Jansen, Johnson & Johnson, Maquet, National Heart Lung and Blood Institute, PCORI. PD: Research support from HeartFlow. No other disclosures were reported.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.