Jensen’s inequality

If is convex, averaging before applying gives a smaller answer than applying first and averaging:

For concave the inequality reverses.

05010015024681012xf(x)
f(E[X]) = 17.454 E[f(X)] ≈ 28.284 E[X] ≈ 4.178 Gap: 10.830 (convex, expect sign +)
X is lognormal. The indigo dot is , the amber dot is . For convex f, amber sits above indigo (); for concave f, below (). The gap grows with σ.

What to notice

  • Convex f puts the amber dot above the indigo. exceeds . The chord joining two points on the curve always sits above the curve for convex functions — Jensen is the distributional version of that geometric fact.
  • Concave f flips the sign. Switch to log or √x. Now indigo sits above amber. Same inequality, other direction.
  • The gap grows with variance. Crank σ up. The sample spreads out, the chord gets longer, and the distance between the chord’s midpoint and the curve widens. Linear means zero gap — that’s the only case Jensen is an equality.

Why it matters

Jensen is a one-line proof of uncountably many inequalities:

  • Log-mean vs. mean-log (AM–GM for logs):
    Apply Jensen with the concave log, and you get the arithmetic-mean–geometric-mean inequality for free.
  • Non-negativity of KL divergence. Apply Jensen with the convex to the ratio :
  • Rao–Blackwell. Replacing an estimator with its conditional expectation given a sufficient statistic never increases mean squared error — because squared error is convex.
  • Log-sum-exp. The softmax’s shift-invariance and numerical-stability tricks are all Jensen-flavored.

Proof sketch

Every convex function has a supporting line at any point : there exists such that for all x. Take and take expectations on both sides. The linear term vanishes (its expectation is zero) and you’re left with .