Hoeffding’s inequality

If $X_{1}, \dots, X_{n}$ are iid and bounded to $[a, b]$ , the sample mean concentrates around its expectation exponentially fast:

P (\overset{ˉ}{X}_{n} - μ \geq t) \leq 2 exp (- \frac{2 n t ^{2}}{( b - a ) ^{2}})

empirical (Uniform on [0, 1], 5000 reps) Hoeffding

2 e^{- 2 n t^{2} / (b - a)^{2}}

Chebyshev

σ^{2} / (n t^{2})

At n = 30: empirical = 0.0052; Hoeffding = 0.5185; Chebyshev = 0.1235.

n (sample size) 30 t (threshold) 0.15 reps 5000

P (\overset{ˉ}{X}_{n} - μ \geq t) \leq 2 exp (- \frac{2 n t ^{2}}{( b - a ) ^{2}})

. Dots are empirical tail probabilities at fixed n. Hoeffding beats Chebyshev exponentially — both hold for any bounded distribution.

What to notice

Exponential vs. polynomial. The dashed Chebyshev bound falls like $1/ n$ . Hoeffding falls like $e^{- c n}$ . Past a few dozen samples Chebyshev is already useless — Hoeffding is barely getting started.
Empirical dots hug the rose line. For Uniform(0, 1), Hoeffding is remarkably tight. Bounded variance alone doesn’t guarantee this — you need the full bounded-support hypothesis for the exponential rate.
The bound doesn’t care about the distribution’s shape. Any distribution on [a, b] gets the same rate. Sub-Gaussian refinements (like Bernstein’s inequality) replace $(b - a)^{2} /4$ with the variance itself when you’re willing to assume more.

Why it matters

PAC learning. Sample complexity for learning a hypothesis class comes from Hoeffding applied to the 0/1 losses of a finite hypothesis class — each loss is bounded, so Hoeffding gives you exponential tail probabilities.
Multi-armed bandits. UCB confidence radii of the form $2 ln t / n_{a}$ are Hoeffding deviations inverted.
Monte Carlo. Hoeffding is why simulation error for bounded estimators shrinks like $1/ n$ with guaranteed probability, not just on average.

Proof idea (Chernoff-style)

Apply Markov’s inequality to $e^{λ (S_{n} - E S_{n})}$ , then bound the moment generating function using Hoeffding’s lemma for bounded random variables, and finally optimize over $λ$ . The result is the concentration inequality above:

P (∣ S_{n} - E [S_{n}] ∣ \geq n t) \leq 2 exp (- \frac{2 n t ^{2}}{( b - a ) ^{2}})

This template — Markov on the MGF, optimize the parameter — produces Bernstein, Bennett, and Azuma-type inequalities as variants.