Hoeffding’s inequality
If are iid and bounded to , the sample mean concentrates around its expectation exponentially fast:
empirical
(Uniform on [0, 1], 5000 reps) Hoeffding Chebyshev
At n = 30: empirical = 0.0052; Hoeffding = 0.5185; Chebyshev = 0.1235.
What to notice
- Exponential vs. polynomial. The dashed Chebyshev bound falls like . Hoeffding falls like . Past a few dozen samples Chebyshev is already useless — Hoeffding is barely getting started.
- Empirical dots hug the rose line. For Uniform(0, 1), Hoeffding is remarkably tight. Bounded variance alone doesn’t guarantee this — you need the full bounded-support hypothesis for the exponential rate.
- The bound doesn’t care about the distribution’s shape. Any distribution on [a, b] gets the same rate. Sub-Gaussian refinements (like Bernstein’s inequality) replace with the variance itself when you’re willing to assume more.
Why it matters
- PAC learning. Sample complexity for learning a hypothesis class comes from Hoeffding applied to the 0/1 losses of a finite hypothesis class — each loss is bounded, so Hoeffding gives you exponential tail probabilities.
- Multi-armed bandits. UCB confidence radii of the form are Hoeffding deviations inverted.
- Monte Carlo. Hoeffding is why simulation error for bounded estimators shrinks like with guaranteed probability, not just on average.
Proof idea (Chernoff-style)
Apply Markov’s inequality to , then bound the moment generating function using Hoeffding’s lemma for bounded random variables, and finally optimize over . The result is the concentration inequality above:
This template — Markov on the MGF, optimize the parameter — produces Bernstein, Bennett, and Azuma-type inequalities as variants.