1.5.7 Variance: just how fat is your distribution?

The average of a probability distribution tells you a lot about the real world. What’s the average height of the lawn? What’s the average amount of money do you expect to lose playing video poker in Las Vegas? What is the life-expectancy of a chipmunk?

But aside from knowing this, you’d like to know how close to that average you expect to be. That is, when Bob looks at some random blade of grass, he might on average get 6cm, but he might find that in one instance it’s 5.5cm, another is 6.9cm. But does he expect to find 10cm? The complete distribution tells you that, but you want just want one number to tell how close, for example 6±1cm. This is a measure of how sharply peaked, or how wide is the distribution.

There are lots of different definitions you could come up with, but we’ll use the same one that everyone else does, the variance.

Var(x)=(xx)2 (1.38)

This means we take the average difference between and outcome and the mean, square it, and then average.

In terms of discrete probabilities this is:

Var(x)=x(xx)2)P(x) (1.39)

and for a continuous distribution:

Var(x)=(xx)2)P(x)dx (1.40)

This has the units of x2 and is often referred to as σ2. The standard deviation is then just σ=Var(x).

Note for those that remember their classical mechanics, that the variance corresponds to the moment of inertia of an object of mass 1, about the center of mass.

Let’s do some examples. What’s the variance for the distribution in fig. 1.1? That’s the case where you flip a coin and you get 1 for heads and -1 for tails. In that case the mean was zero, so applying eqn. 1.39 you have:

σ2=(10)212+(10)212=1 (1.41)

So the variance in this case is 1. For this trial you expect to make 0±1.

Lastly, there’s an interesting identity for the variance that sometimes simplifies calculations:

Var(x)=(xx)2=x22xx+x2 (1.42)
=x22xx+x2=x22x2+x2=x2x2 (1.43)

Now we’ll move on to a more complex example.

*Variance of binomial distro

OK, what’s that ”*” doing up there? Well that means that if you look down the page and you start to feel like you might lose your lunch, just skip to eqn. 1.55 at the end of this section to see the final result.

Now let’s do a simple variation on this. as in problem 1.5.5, you get 0 for tails, and 1 for heads. So the mean for 1 trial is 12. What’s the variance?

σ2=(012)212+(112)212=14 (1.44)

This is a standard deviation σ=12. This makes sense, because it’s the same as the previous example except the peaks are now separated by half the distance, so you expect half the width.

Now let’s consider what you get with 10 tosses. What happens is now σ210σ2. For N tosses σ2=N14. The mean we found out was N/2. So they are both proportional to N.

How do I know this is the answer? What, you think I looked it up? You can do this following the methods we used in section 1.5.6.

We can write down the variance directly from the binomial distribution, just like we did for the mean (eqn. 1.36).

(XX)2=X=0n(XX)2P(X)=X=0n(XX)2(nX)(p)X(1p)nX= (1.47)
X=0n(Xnp)2n!(nX)!X!(p)X(1p)nX

As with the mean, you can do a bad-ass differentiation trick to get the answer, it’s a bit messy, but it works. However I’ll follow the same kind of reasoning we talked about for the mean to get the answer.

So you use the same variable as we did before X=i=1nxi. Now we want to calculate the variance, so let’s see how far we can get:

σ2=(XX)2=(i=1nxii=1nxi)2=(i=1n[xixi])2 (1.48)

Well we’ve got pretty far, but not quite far enough. Now we have the the square of a sum. How do you handle that? Think of the case n=2, (x1+x2)2=x1x1+x1x2+x2x1+x2x2. This looks like a double summation of two indices. So generalizing this, we see that that summation in the above equation can be written as a double sum:

σ2=i=1nj=1n[xixi][xjxj] (1.49)

But the average of a sum is the sum of the average (even for a double sum) so

σ2=i=1nj=1n(xixi)(xjxj) (1.50)

Now there are two kinds of terms in this double sum; ones with i=j and with ij. If ij, for example [x1x1][x2x2] then we can use independence to figure out the answer. In that case

(x1x1)(x2x2)=x1x1x2x2 (1.51)

Because as we saw in section 1.5.6 for independent variables, the average of the product is the product of the average. But x1 is just a constant so averaging that gives doesn’t change it. So

x1x1=x1x1=0 (1.52)

So all terms with ij vanish. We’re only left with the terms i=j in other words a single sum:

σ2=i=1n(xixi)2 (1.53)

Well each term in the sum is just the variance for a single trial. Let’s compute that, say for x1:

(x1x1)2=(x1p)2=(0p)2P(0)+(1p)2P(1)= (1.54)
p2(1p)+(1p)2p=p(1p)(p+(1p)=p(1p)

Good, you’re still awake! Must have had some pretty strong coffee. Well we’re almost there. Now we sum over all n terms and obtain

σ2=np(1p) (1.55)

That’s a pretty simple answer. I could’ve just written it down and left it that. But that would take away all the fun.

Problems

1. Suppose you have an unbiased coin, that is, it lands heads and tails with equal probability. You toss it 16 times and count the total number of heads you get. What’s the standard deviation of the number of total number of heads. Now toss it 256 times. What’s the standard deviation now?

2. Suppose you have the same setup as the last problem, with an unbiased coin. You give this coin to Bob so he can perform experiments on it. You want Bob to estimate the probability that a coin will land heads. Bob tosses it 16 times. He defines the probability of it landing heads to be p= (total number of heads/total number of trials).

(a) On average, what result will Bob obtain?

(b) You want to estimate the standard deviation in Bob’s result. Using the formula for the standard deviation of the binomial distribution, calculate the standard deviation of p that Bob will ascertain.

(c) What happens to the standard deviation when Bob repeats the calculation 256 times instead?