Coupon collector’s problem

A cereal brand hides one of n distinct prizes in each box. You buy boxes one at a time, each containing a uniformly random prize. How many boxes do you need to collect all n prizes?

The expected number is , where is the -th harmonic number:

050100150200020406080100draws to complete setcount
expected = 29.3 simulated mean = 29.3
Histogram of 2000 simulated collection runs for 10 unique coupons. Dashed: .

What to notice

  • The tail is long. Most runs cluster near the expected value, but some runs are much longer. The distribution is right-skewed because completing the last few coupons requires many “wasted” draws on already-collected prizes.
  • Expected grows faster than n. At n = 10, you need about 29 draws. At n = 50, about 225. The asymptotic rate is — not linear, and not quite quadratic.
  • The last coupon is the killer. When you have n−1 of n coupons, each draw has only a 1/n chance of being the missing one, so you wait n more draws on average for just the final piece.

The math

When you have already collected k distinct coupons, the probability of the next box giving a new one is (n−k)/n. The waiting time until the next new coupon is geometric with success probability (n−k)/n, and has expected value n/(n−k). Summing over k from 0 to n−1: