Pareto distribution

The original power law. Vilfredo Pareto noticed that a small fraction of Italians held most of the land, a small fraction of words account for most of the text in a book, and a small fraction of bugs cause most of the crashes. All of them — and many more — follow the same family of distributions.

f (x; x_{m}, α) = \frac{α x _{m}^{α}}{x ^{α + 1}} (x \geq x_{m})

Mean: 2.00 Variance: ∞ Visible samples: 98.7%

xₘ (scale) 1.0 α (shape) 2.0 n (samples) 2000

PDF

f (x; x_{m}, α) = \frac{α x _{m}^{α}}{x ^{α + 1}} (x \geq x_{m})

. Mean is finite only for

α > 1

, variance only for

α > 2

. Bars past the visible window are clipped.

What to notice

The shape $α$ controls the tail. Values of $α$ near 1 mean extremely heavy tails — the mean is barely defined. Values above 2 give a finite variance; above 3, a finite skewness; and so on. For typical “80/20” datasets $α$ hovers around 1.16.
Mean and variance can be infinite. When $α \leq 1$ , the mean is undefined; when $α \leq 2$ , the variance is. Move the slider past 1.0 and watch the “Mean” readout flip to ∞.
Scale-free tails. The complementary CDF is a pure power law:

P (X > x) = (\frac{x _{m}}{x})^{α}

which means the relative probability of exceeding $2 x$ vs $x$ depends on neither $x_{m}$ nor $x$ . That self-similarity is why Pareto tails look straight on a log-log plot.

Power laws in the wild

Wealth, city populations, earthquake magnitudes, book sales, internet link-counts, word frequencies, firm sizes, and file sizes on web servers all exhibit Pareto-like tails. When data looks roughly Gaussian in the middle but has outliers ten orders of magnitude out, a Pareto tail is the usual culprit.

E [X] = \frac{α x _{m}}{α - 1} (α > 1)

Var (X) = \frac{x _{m}^{2} α}{( α - 1 ) ^{2} ( α - 2 )} (α > 2)