The bootstrap
Got one sample. Need a standard error. Traditional approach: derive the sampling distribution of your statistic, plug in the variance formula, hope you didn’t mess up the algebra. Bootstrap approach: resample your data with replacement a few thousand times, take the statistic of each resample, and use the spread of those numbers as the standard error.
bootstrap means (1000 resamples) CLT: N(x̄, s/√n) original x̄ = 0.957
Original sample: n = 30, x̄ = 0.957, s = 1.206. Bootstrap SE = 0.225; CLT SE = s/√n = 0.220. 95% percentile CI: [0.568, 1.443].
What to notice
- Bootstrap SE ≈ CLT SE. For the sample mean, the bootstrap standard error matches to within simulation noise. That’s the sanity check: for easy statistics the bootstrap reproduces the analytic answer.
- No CLT required. Switch the source to Lognormal(0, 1). The original n = 30 sample is badly skewed, yet the bootstrap histogram of sample means still looks Normal — the CLT kicks in at the bootstrap level, even when the underlying distribution resists it.
- Percentile CIs. The dashed green lines mark the 2.5th and 97.5th percentiles of the bootstrap distribution. That’s a 95% CI with no normality assumption, no degrees-of-freedom calculation.
Why it matters
The bootstrap shines when analytic SEs don’t exist or are too fragile:
- Median, quantiles, IQR. No clean closed-form SE; bootstrap gives you one in a few lines of code.
- Non-linear functions of the data. Correlation, R², ratios, ROC-AUC. Delta-method approximations get hairy; bootstrap just works.
- Regression coefficients with heteroskedasticity. The cluster bootstrap and wild bootstrap handle dependence structures that break classical standard errors.
Variants worth knowing
- BCₐ (bias-corrected and accelerated). Adjusts the percentile CI for skewness and bias in the bootstrap distribution. Better coverage than raw percentile in small samples.
- Parametric bootstrap. Fit a model, then resample from the fitted model instead of from the data. Trades distribution-freeness for lower variance.
- Block bootstrap. For time series, resample blocks of consecutive observations to preserve autocorrelation.