Jensen’s inequality
If is convex, averaging before applying gives a smaller answer than applying first and averaging:
For concave the inequality reverses.
f(E[X]) = 17.454 E[f(X)] ≈ 28.284 E[X] ≈ 4.178 Gap: 10.830 (convex, expect sign +)
What to notice
- Convex f puts the amber dot above the indigo. exceeds . The chord joining two points on the curve always sits above the curve for convex functions — Jensen is the distributional version of that geometric fact.
- Concave f flips the sign. Switch to log or √x. Now indigo sits above amber. Same inequality, other direction.
- The gap grows with variance. Crank σ up. The sample spreads out, the chord gets longer, and the distance between the chord’s midpoint and the curve widens. Linear means zero gap — that’s the only case Jensen is an equality.
Why it matters
Jensen is a one-line proof of uncountably many inequalities:
- Log-mean vs. mean-log (AM–GM for logs):
Apply Jensen with the concave log, and you get the arithmetic-mean–geometric-mean inequality for free.
- Non-negativity of KL divergence. Apply Jensen with the convex to the ratio :
- Rao–Blackwell. Replacing an estimator with its conditional expectation given a sufficient statistic never increases mean squared error — because squared error is convex.
- Log-sum-exp. The softmax’s shift-invariance and numerical-stability tricks are all Jensen-flavored.
Proof sketch
Every convex function has a supporting line at any point : there exists such that for all x. Take and take expectations on both sides. The linear term vanishes (its expectation is zero) and you’re left with .