The bootstrap

Got one sample. Need a standard error. Traditional approach: derive the sampling distribution of your statistic, plug in the variance formula, hope you didn’t mess up the algebra. Bootstrap approach: resample your data with replacement a few thousand times, take the statistic of each resample, and use the spread of those numbers as the standard error.

00.511.520.20.40.60.811.21.41.61.8bootstrap sample meandensity
bootstrap means (1000 resamples) CLT: N(x̄, s/√n) original x̄ = 0.957

Original sample: n = 30, x̄ = 0.957, s = 1.206. Bootstrap SE = 0.225; CLT SE = s/√n = 0.220. 95% percentile CI: [0.568, 1.443].

Resample with replacement from the single observed dataset, compute the statistic on each resample, repeat B times. For the sample mean: , matching the CLT. For nastier statistics (median, quantiles), the bootstrap still works where analytic SEs fail.

What to notice

  • Bootstrap SE ≈ CLT SE. For the sample mean, the bootstrap standard error matches to within simulation noise. That’s the sanity check: for easy statistics the bootstrap reproduces the analytic answer.
  • No CLT required. Switch the source to Lognormal(0, 1). The original n = 30 sample is badly skewed, yet the bootstrap histogram of sample means still looks Normal — the CLT kicks in at the bootstrap level, even when the underlying distribution resists it.
  • Percentile CIs. The dashed green lines mark the 2.5th and 97.5th percentiles of the bootstrap distribution. That’s a 95% CI with no normality assumption, no degrees-of-freedom calculation.

Why it matters

The bootstrap shines when analytic SEs don’t exist or are too fragile:

  • Median, quantiles, IQR. No clean closed-form SE; bootstrap gives you one in a few lines of code.
  • Non-linear functions of the data. Correlation, R², ratios, ROC-AUC. Delta-method approximations get hairy; bootstrap just works.
  • Regression coefficients with heteroskedasticity. The cluster bootstrap and wild bootstrap handle dependence structures that break classical standard errors.

Variants worth knowing

  • BCₐ (bias-corrected and accelerated). Adjusts the percentile CI for skewness and bias in the bootstrap distribution. Better coverage than raw percentile in small samples.
  • Parametric bootstrap. Fit a model, then resample from the fitted model instead of from the data. Trades distribution-freeness for lower variance.
  • Block bootstrap. For time series, resample blocks of consecutive observations to preserve autocorrelation.