Now we have the mean and the variance, so we know the distribution right? It’s just going to be a Gaussian as in eqn 1.56, with mean and the variance calculated using eqn 2.11. Leaving out the pesky normalization,
(2.12) |
where I’ve introduced a variable , which you can think of as a rescaled version of :
(2.13) |
But wait, we don’t really know the variance in that equation, only an estimate of it. And the variance itself has a variance, (strap in brain, infinite recursion looming). For example, suppose you estimate of the variance was too small and the true variance was bigger, then the distribution would be flattened out. So you want to do an average over a bunch of Gaussians:
We’re not going to do the math to solve this statistical quagmire. It’s not that hard, just involving one integration, but getting to that stage might put you to sleep. So instead of having a tail that dies off quickly, the smearing over different s is going to give you a tail that dies off less rapidly. Something more like
So in reality, you’ll never see a Gaussian, but a close relative.
The good thing is that this problem was solved a long time ago. The name of this distribution is ”Student’s t-distribution”. Strange? When I was a student, I thought it was called this because it was a mickey-mouse test that real men didn’t use, only students in chem labs. Nope, it’s stranger than that. The guy that figured this out, William Sealy Gosset, was a statistician and a chemist working for Guiness beer (talk about perks of the job) in Dublin. The only downside is that for some reason they didn’t want him to publish under his own name; who knows why. So Gosset had to work in the closet. (Har Har). He published under the nom de plume ”Student” instead.
The distribution depends on the number of data points you’ve got, because the more data, the closer to a Gaussian it’ll be. So if is the number of data points, the t distribution is a function of a variable and n. It’s
(2.14) |
I’ve left out the exact prefactor because it’s a little nasty.
That variable is a normalized version of defined above in eqn 2.13.
So all that’s left is to integrate this distribution to find the area under the curve. Fortunately we have computers to do such things for us (although in principle you could make quite a bit of headway analytically).
As just mentioned, the distribution is a function of two things, and , the total number of data points. When you use a program, or a table to calculate the area under the curve, normally they refer to something called the number of ”degrees of freedom”. In this case, that’s just .
So let’s summarize what we now need to do and then work through an example.