2.8.1 Variance of mean differences

We want to estimate the variance of the difference in the means. We already know how to estimate the variance of the height of men σm2, and the corresponding quantity for women σw2, it’s does in eqn 2.9. Now call the ith data point for the men xm,i and for the women call it xw,i. Call the numbers nw and nm, respectively, because they might be different. So the variance of the means is

Var(Δμ)=Var(1nmixm,i1nwixw,i) (2.10)

Now we’ve got to do the same kind of manipulation that we’ve been doing before except now it’s more involved. If you assume that all the true variances are equal (not the estimates σm and σw, then you get Var(xm,1)(1/nw+1/nm).

But now we need to figure out how to estimate this. We know that the means could be different, and the test we’re devising is suppose to decide if they are or not. So we’re going to come up with an estimate for this variance with the assumption of equal variances and possibly unequal means. When all the smoke clears you get that the unbiased estimate for this is

Var(Δμ)=((nm1)σm2+(nw1)σw2)nm+nw2)(1nm+1nw) (2.11)

But you can get the gist of what’s going on as follows. We’ve seen that variances of independent data add. So the in the right hand side in the above equation, we can just add the variances of two terms separately. When you average, you know that the variances of all the terms like Var(xm,i) are all the same by assumption. That basically gives you the right hand side, but it’s not totally right because this is a biased estimate. To make it unbiased, you got to put in that 2 in the demoninator.

I’m not trying to give a detailed derivation at this stage, but it is worth understanding how the equation behaves and where is comes from intuitively. Without the (1nm+1nw), this is just an estimate of the pooled variance of the height. Those factors like (nm1) cancel out with the variance estimate in eqn 2.9. So this is just like the total variance assuming all the data is from the same population.

We learned in eqn 1.58 that if we have independent data points, the variance of the mean is just 1/n times the variance of the data. Since we’re interested in the deviation of the {difference of the means}, these involve both the men and women. So we just add those two mean-variances together. This gives the factor (1nm+1nw). If either nm or nw is small, this makes our estimate for this difference rather shakey, as to be expected, so you get a big variance.