Gaussian mixture

A Gaussian mixture is a weighted sum of Normal densities. Each component has its own mean and standard deviation; each sample is drawn from exactly one component, chosen at random according to the mixing weight.

f (x) = w N (x ∣ μ_{1}, σ_{1}) + (1 - w) N (x ∣ μ_{2}, σ_{2})

mixture w · N(μ₁, σ₁) (1 − w) · N(μ₂, σ₂) E[X] = 0.60, Var = 3.63

w (weight on comp. 1) 0.40 μ₁ -1.5 σ₁ 1.0 μ₂ 2.0 σ₂ 0.7 n (samples) 1500

f (x) = w N (x ∣ μ_{1}, σ_{1}) + (1 - w) N (x ∣ μ_{2}, σ_{2})

. Slide μ₁ and μ₂ apart relative to σ's until two modes appear — the hallmark of a bimodal mixture.

What to notice

Bimodality is not automatic. Even with $μ_{1} \neq = μ_{2}$ , the mixture looks unimodal unless the means are separated by roughly two standard deviations. A rough rule: $∣ μ_{1} - μ_{2} ∣ > 2 σ_{1} σ_{2}$ is the tipping point.
Weight controls which mode dominates. At $w = 0.5$ the components contribute equally; push the slider and one peak drowns out the other.
Overall mean and variance decompose. The mean is the weighted average of component means. The variance picks up an extra term from the squared distance of each component from the overall mean — a special case of the law of total variance.

Why it matters

Mixtures are the stats world’s Swiss army knife for “kind of Normal but not quite”:

Clustering. Given data, fit a mixture with the EM algorithm and the posterior component probabilities tell you which cluster each point most likely came from.
Density estimation. A mixture of enough Normals can approximate any continuous density to arbitrary accuracy, much like how a Fourier series approximates periodic functions.
Heavy tails from simple ingredients. Two Normals with wildly different $σ$ produce a leptokurtic “outlier-prone” mixture — a cheap model for contaminated data.

E [X] = w μ_{1} + (1 - w) μ_{2}

Var (X) = w (σ_{1}^{2} + (μ_{1} - μ)^{2}) + (1 - w) (σ_{2}^{2} + (μ_{2} - μ)^{2})