Hoeffding’s inequality

If are iid and bounded to , the sample mean concentrates around its expectation exponentially fast:

1e-41e-31e-21e-1150100150200n (sample size)tail probability (log10)
empirical (Uniform on [0, 1], 5000 reps) Hoeffding Chebyshev

At n = 30: empirical = 0.0052; Hoeffding = 0.5185; Chebyshev = 0.1235.

. Dots are empirical tail probabilities at fixed n. Hoeffding beats Chebyshev exponentially — both hold for any bounded distribution.

What to notice

  • Exponential vs. polynomial. The dashed Chebyshev bound falls like . Hoeffding falls like . Past a few dozen samples Chebyshev is already useless — Hoeffding is barely getting started.
  • Empirical dots hug the rose line. For Uniform(0, 1), Hoeffding is remarkably tight. Bounded variance alone doesn’t guarantee this — you need the full bounded-support hypothesis for the exponential rate.
  • The bound doesn’t care about the distribution’s shape. Any distribution on [a, b] gets the same rate. Sub-Gaussian refinements (like Bernstein’s inequality) replace with the variance itself when you’re willing to assume more.

Why it matters

  • PAC learning. Sample complexity for learning a hypothesis class comes from Hoeffding applied to the 0/1 losses of a finite hypothesis class — each loss is bounded, so Hoeffding gives you exponential tail probabilities.
  • Multi-armed bandits. UCB confidence radii of the form are Hoeffding deviations inverted.
  • Monte Carlo. Hoeffding is why simulation error for bounded estimators shrinks like with guaranteed probability, not just on average.

Proof idea (Chernoff-style)

Apply Markov’s inequality to , then bound the moment generating function using Hoeffding’s lemma for bounded random variables, and finally optimize over . The result is the concentration inequality above:

This template — Markov on the MGF, optimize the parameter — produces Bernstein, Bennett, and Azuma-type inequalities as variants.