37 Variance
Expectation is the most commonly used summary of a distribution, as it indicates where values are likely centered. However, it provides limited insight into the distribution’s overall shape. For example, two random variables might have the same mean, yet one could have values spread far from the mean while the other has values tightly clustered around it. Variance, on the other hand, describes how far values in a distribution typically deviate from the mean, offering a measure of the distribution’s dispersion.
Definition 37.1 (Variance) The variance of a random variable \(X\) is defined as \[Var(X)=E\left[X-E(X)\right]^{2}.\] By convention, variance is also denoted by Greek letter \(\sigma^2\), where \(\sigma = \sqrt{Var(X)}\) is called the standard deviation.
Variance measures how far \(X\) typically deviates from its mean, but instead of averaging the differences, we average the squared differences to ensure both positive and negative deviations contribute. The expected deviation, \(E(X-E(X))\), is always zero, so squaring avoids this cancellation. Since variance is in squared units, we take the square root to get the standard deviation, restoring the original units.
We can measure the dispersion of a distribution in different ways. For example, \(E(|X-E(X)|)\) is also a possible choice. But it is less common because the absolute value function isn’t differentiable. Besides, squaring connects to geometric concepts like the distance formula and Pythagorean theorem, which have useful statistical meanings.
Definition 37.1 gives the theoretical variance of a distribution. With finite sample from the distribution, we estimate the variance with sample observations: \[s^2=\frac{1}{n-1}\sum_{i=1}^n (X_i-\bar X)^2\] Note that we divide by \(n-1\) not \(n\). Why? Because we want an unbiased estimator. We will discuss this later in detail. But here is a sketch of the reasoning.
First note: \[\sum_{i=1}^n (X_i - \bar{X})^2 = \sum_{i=1}^n (X_i - \mu)^2 - n(\bar{X} - \mu)^2\] Take expectations of both sides: \[E\left[\sum_{i=1}^n (X_i - \bar{X})^2\right] = n\sigma^2 - n\left(\frac{\sigma^2}{n}\right) = (n-1)\sigma^2\] Dividing by \(n-1\) makes \(E(s^2)=\sigma^2\).
Theorem 37.1 For any random variable \(X\), \[Var(X)=E(X^{2})-(EX)^{2}.\]
Proof. Let \(\mu=E(X)\). By definition, \[\begin{aligned} Var(X) & =E(X-\mu)^{2}=E(X^{2}-2\mu X+\mu^{2})\\ & =E(X^{2})-2\mu E(X)+\mu^{2}=E(X^{2})-\mu^{2}.\end{aligned}\]
Example 37.1 Find the variance for \(X\sim\textrm{Bern}(p).\) \[Var(X)=E(X^{2})-E^{2}(X)=p-p^{2}=p(1-p).\]
Proposition 37.1 Variance has the following properties:
- \(Var(X)\geq0\)
- \(Var(X+c)=Var(X)\)
- \(Var(cX)=c^{2}Var(X)\)
- If \(X,Y\) are independent, \(Var(X+Y)=Var(X)+Var(Y)\).
- If \(X_{1},X_{2},\dots,X_{n}\) are independent, \({\displaystyle Var(\sum_{i=1}^{n}X_{i})=\sum_{i=1}^{n}Var(X_{i})}\).
Example 37.2 (Variance of Binomial distribution) Find the variance for \(X\sim\textrm{Bin}(n,p).\) \(X=X_{1}+\cdots+X_{n}\) where \(X_{i}\) are \(i.i.d\) Bernoulli distributions \[Var(X)\overset{iid}{=}\sum_{i=1}^{n}Var(X_{i})=np(1-p).\]
Example 37.3 (Variance of Poisson distribution) Let \(X\sim\text{Pois}(\lambda)\). To find the variance, we first compute \(E(X^{2})\). By LOTUS,
\[\begin{aligned}E(X^{2})= & \sum_{k=0}^{\infty}k^{2}\cdot\frac{e^{-\lambda}\lambda^{k}}{k!}=e^{-\lambda}\sum_{k=1}^{\infty}k^{2}\frac{\lambda^{k}}{k!}\end{aligned}\]
Differentiate \(\sum_{k=0}^{\infty}\frac{\lambda^{k}}{k!}=e^{\lambda}\) on both sides with respect to \(\lambda\) and multiply (replenish) again by \(\lambda\):
\[\sum_{k-1}^{\infty}k\frac{\lambda^{k}}{k!}=\lambda e^{\lambda}\] Repeat: \[\sum_{k-1}^{\infty}k^{2}\frac{\lambda^{k}}{k!}=\lambda(e^{\lambda}+\lambda e^{\lambda})\] Therefore, we have \[E(X^{2})=e^{-\lambda}(\lambda+\lambda^{2})e^{\lambda}=\lambda+\lambda^{2}\] Finally, \[Var(X)=E(X^{2})-(E(X))^{2}=\lambda+\lambda^{2}-\lambda^{2}=\lambda.\]