Article Text

other Versions

Towards an epidemiology of the known unknowns in cryptogenic stroke
  1. Issa J Dahabreh1,
  2. David M Kent2
  1. 1Center for Clinical Evidence Synthesis, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA
  2. 2Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA
  1. Correspondence to Dr David Kent, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, 800 Washington Street, Box 63, Boston, MA 02111, USA; dkent1{at}

Statistics from

[T]here are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don't know we don't know.

US Secretary of Defense Donald Rumsfeld

Ischaemic stroke is a heterogeneous disorder, with many potential causes. According to the TOAST (Trial of Org 10172 in Acute Stroke Treatment) classification, a stroke may be due to large vessel, small vessel or cardioembolic disease, or may have some other determined cause.1 Different stroke mechanisms can point to different strategies for secondary prevention. Yet, even after an extensive work-up, about 30% of ischaemic strokes cannot be classified into any of these four categories and are instead classified as ‘cryptogenic’.

Although one might expect secondary prevention of cryptogenic stroke to be relatively standardised, because the underlying mechanism is unknown in all patients, it is true that cryptogenic stroke is itself a heterogeneous entity. Possible disease mechanisms include atheroembolic (eg, in patients with aortic arch atheromas), cardioembolic (such as in patients with occult paroxysmal atrial fibrillation), and lacunar disease. For patients with cryptogenic stroke found to have a patent foramen ovale (PFO), the possible mechanisms underlying the ischaemic event would include all of the above plus paradoxical embolism, that is, a stroke caused when an embolus formed in the venous circulation gains access to the systemic arterial circulation through a right-to-left shunt (RLS). Does the fact that the cause of stroke is unknown in patients with cryptogenic stroke and PFO imply that a uniform prevention strategy should be used? In this issue of Heart, Pezzini and colleagues2 report the results of a case-control study that suggests identifiable variation in the distribution of the ‘known unknown’ aetiologies among patients with cryptogenic stroke and PFO that might ultimately prove important in guiding prevention strategies for this group of patients.

The study used cryptogenic stroke cases identified through the Italian Project on Stroke in Young Adults (IPSYS) and convenience controls to add to the expanding epidemiological evidence of an association between RLS and cryptogenic stroke,3 4 and confirm the well-established association between atherosclerotic risk factors and stroke. More interestingly, the authors reported a statistically significant interaction between RLS and atherosclerotic risk factor categories on cryptogenic stroke risk. A statistical interaction between two variables describes a relationship in which the association of one variable with the outcome of interest has a different magnitude within strata of the other variable. In this study, among patients with no atherosclerotic risk factors the presence of RLS increased the odds of a cryptogenic stroke by more than five times (OR=5.36,i using the counts in table 2 of the manuscript); however, among patients with an atherosclerotic risk score of 1 or higher, the effect of RLS was substantially lower (OR=2.43). Indeed, with greater atherosclerotic burden, the effect of RLS appeared to attenuate completely.

On a first look, this result may be taken to suggest that the presence of atherosclerotic risk factors somehow protects patients from RLS-associated strokes. While this is consistent with the data, a more clinically plausible explanation is that the interaction arises because cryptogenic stroke is a heterogeneous disorder that may be due to arterial atherosclerotic disease (in the presence of atherosclerotic risk factors) or venous thromboembolic disease (ie, paradoxical embolism, in the presence of a RLS). Because cryptogenic stroke is a common effect of these two different mechanisms, risk factors for one of them will appear to ‘protect’ stroke patients from risk factors for the other. This phenomenon has been previously reported in studies of cryptogenic stroke5–7 and can also be seen among the group of cases in the IPSYS study.8 Because RLS is a congenital heart abnormality not known to cause or prevent atherosclerotic disease (or dyslipidaemia, diabetes, hypertension or nicotine dependence), the negative association at first may appear paradoxical.9 However, when two factors contribute to the risk of an outcome, conditioning on the outcome (eg, examining only stroke cases) induces dependence between the factors, even when they are independently distributed in the source population.10 Heuristically, the association may be thought to arise because patients with RLS do not require other risk factors to develop a cryptogenic stroke; it will be present among individuals in the case group specifically because they are selected on the basis of the presence of a disease caused by both factors.9

Because the association between risk factors of two different causal mechanisms is present among patients in the case group but not among controls sampled from the general population, it can explain the statistical interaction observed in the study by Pezzini and colleagues. However, using the interaction to make inferences about the relative probability of each of these mechanisms in a given patient with stroke requires a more specific model of causation. In figure 1, we present one such model, where the two mechanisms are assumed to cause strokes through completely independent pathways (ie, atherosclerotic risk factors do not cause stroke by paradoxical embolism, presence of PFO does not cause stroke by atherosclerotic risk factors, PFO and atherosclerotic risk factors do not share any common causes and each stroke event can be attributed to only one of the two mechanisms). Under these fairly strong assumptions, the proportion of PFOs that are ‘pathogenic’ becomes estimable.4

Figure 1

The figure presents a hypothetical model that would lead to an attenuation of the PFO effect with increasing atherosclerotic risk factor burden, as was observed in the case-control study by Pezzini and colleagues.2 Panel A shows the number of cryptogenic strokes in patients with PFO (red) and without PFO (grey). Here we assume that a single stroke cannot be caused by both paradoxical embolism and atherosclerotic disease. The horizontal dashed line separates strokes where the PFO was pathogenic (below the line; the risk of which is constant across all values of atherosclerotic risk burden), from strokes caused by atherosclerotic disease (above the line). The proportion of patients ‘with’ versus ‘without’ PFO among those with atherosclerotic strokes (red area above the horizontal dotted line and grey area) remains constant at any given level of atherosclerotic risk (and is assumed to be the same as in the control population, not shown). Panel B shows the frequency of PFO among the cases across levels of atherosclerotic risk. The relative frequency of PFO decreases with increasing atherosclerotic burden. While it is possible to observe the relationship between the proportion of patients with and without PFO (ie, the red vs the grey areas), it is not possible to determine which PFOs are causally related to stroke and which are incidental (ie, the position of the dashed line is not known). Panel C shows the OR for the PFO effect. This compares the odds of having a PFO among cases (panel B) with the odds among controls (assumed to be constant, not shown), with increasing atherosclerotic risk burden. The change in the magnitude of the OR over atherosclerotic risk corresponds to the statistical interaction observed by Pezzini and colleagues.

Of course, as Secretary Rumsfeld discovered in another context, reality does not always conform to our simple models; there are also ‘unknown unknowns’.11 Here, the true causal structure is likely to be more complex than we have assumed, involving unidentified factors (‘unknown unknowns’) and causal interactions between both known and unknown factors. Temporal changes in the values of some of these factors can be better explored with longitudinal study designs and appropriate modelling approaches if causal inferences are to be drawn.12 Further, readers should keep in mind that interaction effects in regression models may be dependent on the analysis scale (eg, be present on the OR but not the risk difference scale), and the underlying (‘true’) probability model generating the observations. As always, research studies are subject to sampling variation and promising findings need to be externally validated.

Even if the exact contribution of each mechanism is not accurately estimable, the importance of weighing the relative contribution of competing causal pathways stems from the availability of different secondary prevention strategies. Statins and antiplatelet therapy (and antihypertensives, as indicated) are recommended for secondary prevention of stroke from atherosclerotic disease. Anticoagulant therapy is recommended for secondary prevention of cardioembolic stroke from atrial fibrillation. Percutaneous closure is an appealing (though unproven13) therapy for strokes caused by paradoxical embolism. The potential of these preventive interventions for reducing stroke recurrence may depend on the underlying stroke mechanism. Although cryptogenic strokes may be due to any of these aetiologies, ongoing epidemiological research14 could permit evidence-based ‘profiling’ to help guide secondary stroke prevention even when the diagnostic work-up is unable to provide us conclusively with ‘known knowns’.


View Abstract


  • Funding This study was partially funded by grants UL1 RR025752 and R01 NS062153, both from the National Institutes of Health. The funder had no role in the preparation, review or approval of the manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

  • Competing interests None.

  • Provenance and peer review Commissioned; internally peer reviewed.

  • i Given the case-control design of the study, we focus on results presented in the OR scale, despite the advantages of the risk difference scale for causal inference. Risk difference calculations are not possible from case-control data without making additional assumptions, or relying on external information on the source population.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.