2.8 The shape of the distribition 2.8 The shape of the distribition 2.8.2 There’s no such thing as a brontosaurus

2.8.1 Variance of mean differences

We want to estimate the variance of the difference in the means. We already know how to estimate the variance of the height of men $\sigma_{m}^{2}$ , and the corresponding quantity for women $\sigma_{w}^{2}$ , it’s does in eqn 2.9. Now call the ith data point for the men $x_{m,i}$ and for the women call it $x_{w,i}$ . Call the numbers $n_{w}$ and $n_{m}$ , respectively, because they might be different. So the variance of the means is

Var(\Delta\mu)=Var({1\over n_{m}}\sum_{i}x_{m,i}-{1\over n_{w}}\sum_{i}x_{w,i})

(2.10)

Now we’ve got to do the same kind of manipulation that we’ve been doing before except now it’s more involved. If you assume that all the true variances are equal (not the estimates $\sigma_{m}$ and $\sigma_{w}$ , then you get $Var(x_{m,1})(1/n_{w}+1/n_{m})$ .

But now we need to figure out how to estimate this. We know that the means could be different, and the test we’re devising is suppose to decide if they are or not. So we’re going to come up with an estimate for this variance with the assumption of equal variances and possibly unequal means. When all the smoke clears you get that the unbiased estimate for this is

Var(\Delta\mu)=({{(n_{m}-1)\sigma_{m}^{2}+(n_{w}-1)\sigma_{w}^{2})}\over n_{m}% +n_{w}-2})({1\over n_{m}}+{1\over n_{w}})

(2.11)

But you can get the gist of what’s going on as follows. We’ve seen that variances of independent data add. So the in the right hand side in the above equation, we can just add the variances of two terms separately. When you average, you know that the variances of all the terms like $Var(x_{m,i})$ are all the same by assumption. That basically gives you the right hand side, but it’s not totally right because this is a biased estimate. To make it unbiased, you got to put in that $-2$ in the demoninator.

I’m not trying to give a detailed derivation at this stage, but it is worth understanding how the equation behaves and where is comes from intuitively. Without the $({1\over n_{m}}+{1\over n_{w}})$ , this is just an estimate of the pooled variance of the height. Those factors like $(n_{m}-1)$ cancel out with the variance estimate in eqn 2.9. So this is just like the total variance assuming all the data is from the same population.

We learned in eqn 1.58 that if we have independent data points, the variance of the mean is just $1/n$ times the variance of the data. Since we’re interested in the deviation of the {difference of the means}, these involve both the men and women. So we just add those two mean-variances together. This gives the factor $({1\over n_{m}}+{1\over n_{w}})$ . If either $n_{m}$ or $n_{w}$ is small, this makes our estimate for this difference rather shakey, as to be expected, so you get a big variance.