So mathematicians will go on about all the cool properties of Gaussian distribution, for example, they are their own Fourier tranform, their higher order cumulants are all zero, etc… but why should any half-way normal person be interested in them?
The answer lies in something called the ”Central Limit Theorem”. I’ll describe the jist of it. You know back in section 1.3.8 when we were asking the question, what’s the probability of getting $\colorbox[rgb]{1,1,1}{$m$}$ heads when you toss a coin $\colorbox[rgb]{1,1,1}{$n$}$ times. We saw that was equivalent to asking the question, what’s the probability you’ll make $$\colorbox[rgb]{1,1,1}{$m$}$ when you toss a coin $\colorbox[rgb]{1,1,1}{$n$}$ times given the rule that you make nothing if it lands tails and $1 if it lands heads. In this case you’re summing up random variable ${\colorbox[rgb]{1,1,1}{$x$}}_{\colorbox[rgb]{1,1,1}{$i$}}$’s,
$$\colorbox[rgb]{1,1,1}{$X$}\colorbox[rgb]{1,1,1}{$=$}\colorbox[rgb]{1,1,1}{$\sum $}_{\colorbox[rgb]{1,1,1}{$i$}\colorbox[rgb]{1,1,1}{$=$}\colorbox[rgb]{1,1,1}{$1$}}^{\colorbox[rgb]{1,1,1}{$n$}}{\colorbox[rgb]{1,1,1}{$x$}}_{\colorbox[rgb]{1,1,1}{$i$}}$$ | (1.57) |
This is the scenario we discussed in subsection 1.5.6, so the binomial distribution is just the probability of getting different values of $\colorbox[rgb]{1,1,1}{$X$}$.
What I’m saying is that you can think of the binomial distribution as the probability distribution for the sum of a bunch of independent random variables. In this case one where each random variable takes the value $\colorbox[rgb]{1,1,1}{$0$}$ with probability $\colorbox[rgb]{1,1,1}{$1$}\colorbox[rgb]{1,1,1}{$-$}\colorbox[rgb]{1,1,1}{$p$}$ and $\colorbox[rgb]{1,1,1}{$1$}$ with probability $\colorbox[rgb]{1,1,1}{$p$}$.
We also discussed how it looked, but I left out an important observation: the binomial distribution looks more and more like a Gaussian distribution when the number of trials $\colorbox[rgb]{1,1,1}{$n$}$ gets large! Check it out, in figure 1.3.9. we see a distribution that looks a lot like a brontosaurus. Let’s line compare them on the same graph.
Above is a figure of the two distributions, the Gaussian (green dashed) and the binomial distribution of figure 1.3.9 (red impulses), on the same graph. Even here with $\colorbox[rgb]{1,1,1}{$n$}\colorbox[rgb]{1,1,1}{$=$}\colorbox[rgb]{1,1,1}{$50$}$ (which isn’t all that big), you can see that they line up right on top each other. The Gaussian was chosen to have the same mean and variance as the binomial distribution. These we computed earlier in sections 1.5.6 and 1.5.7.
So where do the two curves differ? A Gaussian distribution is nonzero all along the real line, whereas a binomial can’t be nonzero for $$ or $\colorbox[rgb]{1,1,1}{$m$}\colorbox[rgb]{1,1,1}{$>$}\colorbox[rgb]{1,1,1}{$n$}$. (If you do an experiment where you toss a coin 50 times, you can’t get 51 heads unless of course you’re including the pink elephants). So clearly at the tails of the distribution, these to distributions do differ. But how big is a gaussian in that region? It’s exponentially small as you’ll now verify in the following problem.
1. Suppose you toss an unbiased coin (i.e. $\colorbox[rgb]{1,1,1}{$p$}\colorbox[rgb]{1,1,1}{$=$}\colorbox[rgb]{1,1,1}{$.5$}$) 50 times. Consider $\colorbox[rgb]{1,1,1}{$X$}$, the total number of heads that you obtain.
(a) What’s the average of $\colorbox[rgb]{1,1,1}{$X$}$? Remember that this is also called the mean.
(b) What’s the variance?
(c) Using the answers for (a) and (b), write down the corresponding Gaussian distribution that approximately fits $\colorbox[rgb]{1,1,1}{$P$}\colorbox[rgb]{1,1,1}{$($}\colorbox[rgb]{1,1,1}{$X$}\colorbox[rgb]{1,1,1}{$)$}$.
(d) Calculate $\colorbox[rgb]{1,1,1}{$P$}\colorbox[rgb]{1,1,1}{$($}\colorbox[rgb]{1,1,1}{$51$}\colorbox[rgb]{1,1,1}{$)$}$ and $\colorbox[rgb]{1,1,1}{$P$}\colorbox[rgb]{1,1,1}{$($}\colorbox[rgb]{1,1,1}{$-$}\colorbox[rgb]{1,1,1}{$1$}\colorbox[rgb]{1,1,1}{$)$}$ using this Gaussian approximation.
(e) What should the exact answer be?
So we see that the distribution for the sum of $\colorbox[rgb]{1,1,1}{$n$}$ independent random binary variables becomes a Gaussian for large $\colorbox[rgb]{1,1,1}{$n$}$. Was this some weird mathematical curiosity that gets filed away in your head, right next what you had last year for breakfast? No, it’s a lot more general than that. If you take Bob’s blade’s of grass, no matter what funky distribution each one takes, the sum of their lengths, $\colorbox[rgb]{1,1,1}{$X$}$, will also be well approximated by a Gaussian for large $\colorbox[rgb]{1,1,1}{$n$}$.
Under a fairly broad set of conditions, the sum of a large number of independent random variables will look like a Gaussian distribution. Since the mean $\colorbox[rgb]{1,1,1}{$X$}\colorbox[rgb]{1,1,1}{$/$}\colorbox[rgb]{1,1,1}{$n$}$ is related to $\colorbox[rgb]{1,1,1}{$X$}$ by a scaling factor, one can say the same thing: the distribution of the average of a large number of independent random variables is well approximated by a Gaussian distribution. The only thing that you need to do is fit the Gaussian with the correct mean and variance.
This is important because the mean is something you calculate all the time with experimental data. So without knowing much about your data, you know that this mean will follow a Gaussian distribution. That has a lot of important consequences for statistics.
If you know the variance of each variable $\colorbox[rgb]{1,1,1}{$V$}\colorbox[rgb]{1,1,1}{$a$}\colorbox[rgb]{1,1,1}{$r$}\colorbox[rgb]{1,1,1}{$($}\colorbox[rgb]{1,1,1}{$x$}\colorbox[rgb]{1,1,1}{$)$}$, you can then also obtain the variance of the sum $\colorbox[rgb]{1,1,1}{$X$}$ of $\colorbox[rgb]{1,1,1}{$n$}$ of these variables, as we did for the binomial distribution, where it’s just $\colorbox[rgb]{1,1,1}{$n$}\colorbox[rgb]{1,1,1}{$V$}\colorbox[rgb]{1,1,1}{$a$}\colorbox[rgb]{1,1,1}{$r$}\colorbox[rgb]{1,1,1}{$($}\colorbox[rgb]{1,1,1}{$x$}\colorbox[rgb]{1,1,1}{$)$}$. Because the mean $\overline{\colorbox[rgb]{1,1,1}{$x$}}$ just divides the sum by $\colorbox[rgb]{1,1,1}{$n$}$, you arrive at the important conclusion
$$\colorbox[rgb]{1,1,1}{$V$}\colorbox[rgb]{1,1,1}{$a$}\colorbox[rgb]{1,1,1}{$r$}\colorbox[rgb]{1,1,1}{$($}\overline{\colorbox[rgb]{1,1,1}{$x$}}\colorbox[rgb]{1,1,1}{$)$}\colorbox[rgb]{1,1,1}{$=$}\colorbox[rgb]{1,1,1}{$V$}\colorbox[rgb]{1,1,1}{$a$}\colorbox[rgb]{1,1,1}{$r$}\colorbox[rgb]{1,1,1}{$($}\frac{\colorbox[rgb]{1,1,1}{$X$}}{\colorbox[rgb]{1,1,1}{$n$}}\colorbox[rgb]{1,1,1}{$)$}\colorbox[rgb]{1,1,1}{$=$}\frac{\colorbox[rgb]{1,1,1}{$V$}\colorbox[rgb]{1,1,1}{$a$}\colorbox[rgb]{1,1,1}{$r$}\colorbox[rgb]{1,1,1}{$($}\colorbox[rgb]{1,1,1}{$x$}\colorbox[rgb]{1,1,1}{$)$}}{\colorbox[rgb]{1,1,1}{$n$}}$$ | (1.58) |
This means if you toss an unbiased coin 100 times, and you measure the mean proportion of heads, you’ll get $\colorbox[rgb]{1,1,1}{$.5$}$ with some deviation. To get that deviation, you calculate the error of the mean. The variance for one flip, as we saw before, was $\colorbox[rgb]{1,1,1}{$1$}\colorbox[rgb]{1,1,1}{$/$}\colorbox[rgb]{1,1,1}{$4$}$. So the variance in the mean for 100 flips is $\colorbox[rgb]{1,1,1}{$1$}\colorbox[rgb]{1,1,1}{$/$}\colorbox[rgb]{1,1,1}{$400$}$. To get the standard deviation, you take the square root, which gives $\colorbox[rgb]{1,1,1}{$1$}\colorbox[rgb]{1,1,1}{$/$}\colorbox[rgb]{1,1,1}{$20$}$. So if you do an experiment once you might get a mean of $\colorbox[rgb]{1,1,1}{$.53$}$, or $\colorbox[rgb]{1,1,1}{$.44$}$, or $\colorbox[rgb]{1,1,1}{$.51$}$. Everytime you do the experiment you get a different result but only different by about a standard deviation.
This is a simplified discussion of the central limit theorem. I should also state that even though the Gaussian distribution does well approximate the correct distribution for a mean (or a sum), there are circumstances where this difference can be crucial. But we’re fortunate that it’s normally easy to figure out when it’s not possible to use this theorem.