Pareto distribution
The original power law. Vilfredo Pareto noticed that a small fraction of Italians held most of the land, a small fraction of words account for most of the text in a book, and a small fraction of bugs cause most of the crashes. All of them — and many more — follow the same family of distributions.
What to notice
- The shape controls the tail. Values of near 1 mean extremely heavy tails — the mean is barely defined. Values above 2 give a finite variance; above 3, a finite skewness; and so on. For typical “80/20” datasets hovers around 1.16.
- Mean and variance can be infinite. When , the mean is undefined; when , the variance is. Move the slider past 1.0 and watch the “Mean” readout flip to ∞.
- Scale-free tails. The complementary CDF is a pure power law:
which means the relative probability of exceeding vs depends on neither nor . That self-similarity is why Pareto tails look straight on a log-log plot.
Power laws in the wild
Wealth, city populations, earthquake magnitudes, book sales, internet link-counts, word frequencies, firm sizes, and file sizes on web servers all exhibit Pareto-like tails. When data looks roughly Gaussian in the middle but has outliers ten orders of magnitude out, a Pareto tail is the usual culprit.