21 Hypergeometric dist
Example 21.1 Let’s explore an example that appears to be Binomial but is, in fact, not a Binomial distribution. Given a 5-card hand. Find the distribution of the number of aces.
Let \(X\) be the number of aces. It is tempting to say \(X\sim \text{Bin}(5,p)\). But this not correct. Because having one ace is NOT independent from having another ace. We need to use the classical approach: \[P(X=k)=\frac{\binom{4}{k}\binom{48}{5-k}}{\binom{52}{5}}.\] This is a Hypergeometric distribution.
Suppose we have a box filled with \(w\) white and \(b\) black balls. We draw \(n\) balls out of the box with replacement. Let \(X\) be the number of white balls. Then \(X\sim \text{Bin}(n,w/(w+b))\). Since the draws are independent Bernoulli trials, each with probability \(w/(w+b)\) of success. If we instead sample without replacement, then the number of white balls follows a Hypergeometric distribution. We denote this by \(X\sim\textrm{HGeom}(w,b,n)\).
Theorem 21.1 If \(X\sim\textrm{HGeom}(w,b,n)\), then the PMF of \(X\) is \[p_{X}(k)=\frac{\binom{w}{k}\binom{b}{n-k}}{\binom{w+b}{n}},\] for integers \(k\) satisfying \(0\leq k\leq w\) and \(0\leq n-k\leq b\), and \(p_{X}(k)=0\) otherwise.
In Example 21.1, the number of aces in the hand has the \(\textrm{HGeom}(4,48,5)\) distribution, which can be seen by thinking of the aces as white balls and the non-aces as black balls. The probability of having exactly three aces is \(0.0017\%\).
Example 21.2 Let \(X\sim\textrm{HGeom}(w,b,n)\). Find \(E(X)\) the expected number of white balls. Similarly, we can decompose \(X\): \[X=I_{1}+\cdots+I_{n}\] where \(I_{j}\) equals \(1\) if the \(j\)th ball is white and \(0\) otherwise. We have said that \(\{I_{j}\}\) are not independent, but the linearity of expectation still holds: \[E(X)=E(I_{1}+\cdots+I_{n})=E(I_{1})+\cdots+E(I_{n}).\] Meanwhile we have \[E(I_{j})=P(j\textrm{-th ball is white})=\frac{w}{w+b}\] since unconditionally the \(j\)th ball is equally likely to be any of the balls. Thus, \(E(X)=\frac{nw}{w+b}\).
The Binomial and Hypergeometric distributions are often confused. Both are discrete distributions taking on integer values between \(0\) and \(n\) for some \(n\), and both can be interpreted as the number of successes in \(n\) Bernoulli trials. However, a crucial part of the Binomial story is that the Bernoulli trials involved are independent. The Bernoulli trials in the Hypergeometric story are dependent, since the sampling is done without replacement.