Base rate neglect
A test for a rare disease is 99% accurate. You test positive. How likely is it that you’re actually sick?
The instinct says ~99%. The real answer, for a disease that afflicts 1% of the population, is closer to 50%.
Why so low?
Because the healthy population is enormous relative to the sick one. Out of 10 000 people:
- 100 are sick. The 99% sensitive test catches 99 of them — those are the true positives.
- 9 900 are healthy. The 99% specific test misidentifies 1% of them as positive — that’s 99 false positives.
Among the 198 people who tested positive, only 99 are actually sick. .
What to try
- Drop prevalence to 0.1%. Watch collapse — you’d need near-perfect specificity for a positive result to mean much.
- Lift prevalence to 10%. Now the same test is far more informative — you’re starting from a less surprising prior.
- Drag specificity from 95% to 99.9%. Small changes at the high end dramatically improve the positive predictive value; that’s why medical screening often involves a confirmatory test.
Why it matters
Base rates are why screening everyone for rare conditions creates a flood of false positives even with good tests. It’s why security alert systems cry wolf. It’s why predictive policing models flagging rare events are usually wrong about the individual, even when they’re “accurate” on the aggregate.
The test doesn’t tell you whether you’re sick. It updates the prior you brought in. If your prior was low, even a strong update often leaves you below 50%.