In statistics, a spurious relationship (or, sometimes, spurious correlation) is a mathematical relationship in which two occurrences have no causal connection, yet it may be inferred that they do, due to a certain third, unseen factor (referred to as a "confounding factor" or "lurking variable"). The spurious relationship gives an impression of a worthy link between two groups that is invalid when objectively examined.
The misleading correlation between two variables is produced through the operation of a third causal variable. In other words we find a correlation between A and B. So we have three possible relationships:
- A causes B,
- B causes A,
- C causes both A and B.
The last is a spurious correlation. It is therefore often said that "Correlation does not imply causation".
An example of a spurious relationship can be illuminated examining a city's ice cream sales. These sales are highest when the rate of drownings in city swimming pools is highest. To allege that ice cream sales cause drowning, or vice-versa, would be to imply a spurious relationship between the two. In reality, a heat wave may have caused both. The heat wave is an example of a hidden or unseen variable.
Another popular example is a series of Dutch statistics showing a positive correlation between the number of storks nesting in a series of springs and the number of human babies born at that time. Of course there was no causal connection; they were correlated with each other only because they were correlated with the weather nine months before the observations.Roger Sapsford, Victor Jupp, ed. (2006). Data Collection and Analysis. Sage. ISBN 0-7619-4362-5.
The term is commonly used in statistics and in particular in experimental research techniques. Experimental research attempts to understand and predict causal relationships (X → Y). A non-causal correlation can be spuriously created by an antecedent which causes both (W → X & Y). Intervening variables (X → W → Y), if undetected, may make indirect causation look direct. Because of this, experimentally identified correlations do not represent causal relationships unless spurious relationships can be ruled out.
In practice, three conditions must be met in order to conclude that X causes Y, directly or indirectly:
- X must precede Y
- Y must not occur when X does not occur
- Y must occur whenever X occurs
Spurious relationships can often be identified by considering whether any of these three conditions have been violated.
The final condition may be relaxed in the case of indirect causation. For example, consider a pistol duel. Two men face off and fire at each other. If one man dies as a result of the other man's shot, we can rightly conclude that the other man caused his death. However, if a doctor saves the wounded man's life (thus violating the third premise), this does not undermine causation, only direct causation. The biological damage (W) sustained from the shot (X) causes death (Y), not the shot itself, allowing medical intervention.
- Burns, William C., "Spurious Correlations", 1997.
- "The Art and Science of Cause and Effect": a slide show and tutorial lecture by Judea Pearl
- Pearl, Judea. Causality: Models, Reasoning and Inference, Cambridge University Press, 2000.