Regression to the mean
Take the tallest children in a class. Their parents tend to be taller than average — but not as tall as the children themselves. Measure the same children again next week; their scores will drift toward the class mean. This is regression to the mean, and it requires no explanation beyond probability.
What to notice
- Orange dots are parents in the top quartile (X > +0.67σ). Red dot is their children’s mean.
- The red dot always lies below the orange cluster — children’s mean Y = r·(mean X of parents) < mean X of parents, since r < 1.
- r near 1: strong inheritance — children’s mean barely moves. r near 0: children’s mean collapses to zero regardless of parents.
- The blue regression line captures the effect exactly.
Why it happens
The math is simple: in a bivariate normal with correlation r, the conditional expectation is . For any r < 1, extreme values of X predict less-extreme values of Y. There’s no force pulling children toward mediocrity — the effect is entirely a consequence of imperfect correlation.
The practical danger
Regression to the mean produces many false causal stories:
- A student scores unusually low on a test, gets tutoring, scores higher — tutoring gets the credit.
- A sports team has an exceptional season, regresses the next year — the coach gets blamed.
- A company tries an unusual intervention when sales are worst, sales recover — intervention gets the credit.
In each case the change would have happened without any intervention. Galton named the phenomenon in 1886 studying human height. The word “regression” — now used for all linear prediction — comes from his description of this pull.