We’ll consider both continuous and discrete variables. They’re quite similar once you get the hang of it, but because both have important applications we’ll consider both.
Let’s go back to the coin game above where $1 if it lands heads and you have to pay $1 if it lands tails. You throw the coin twice. The first thing to consider is a probability distribution involving two variables. We’ll choose the first variable to be the amount won (or lost) in the first flip. The second variable is the total amount won for both flips. We want to know for example, the probability that you make $1 on the first flip but make $ 0 overall. Generalizing, we want to know , the probability of making $’s on the first flip and $’s overall.
Let’s determine . There are 3 possible values of -2, 0, and 2. There are two possible values of , 1 and -1. There are four possible outcomes for these two flips. So what’s ? Well there’s only one way of getting that, you have to get tails on both flips. So . Similarly . How about ? Well you can’t lose on the first flip and make money on both flips, so . Similarly . How about ? There’s only one way of getting that, you’ve got to make $1 on the second flip, so . Similarly . So we’ve determined the values of this two dimensional probability distribution.
|
(1.22) |
Now let’s ask for the average value of various quantities. First, what’s the average value of ? You already should know the answer, but how do you calculate it from here?
(1.23) |
Here we’re summing over all possible values of and . To do this sum, we can do the sum over first.
(1.24) |
From the table above, when you do that sum you add up all the entries in each row. Both of these are . So
(1.25) |
as it should be.
Now let’s do a more tricky one. What’s ?
(1.26) |
We’ll do this term by term
|
(1.27) |
You see the only nonzero entries are on the upper left hand and lower right hand corners. So adding up all these terms, you get
(1.28) |
Suppose you want to compute the average of something like height (h) weight (w), for a population of chipmunks. This involves two separate variables. You might have the distribution which is a probability density. is the probability of finding a chipmunk of height between and , and weight between and . In this case we can then define the average as
(1.29) |
This is for a continuous distribution For a discrete one, you’d replace the integrals by summation, as we saw in above in the one dimensional case.
Suppose that you have two variables that are independent. The case of height and weight we used above, is not a good example of this, because you might expect that a taller chipmunk would tend to weigh more. However we can think of something else, let’s take two blades of grass at opposite sides of a lawn. Call the length of the first blade and the second one . You’d expect their lengths to be quite independent, That means . Now we see that
(1.30) | |||
This says that for independent variables, the average of the product is the product of the average. This property makes independent variables very important conceptually in the study of probability.
Let’s see how we can apply it to examples of problems where the variables aren’t independent. Consider the example we went through above of the two dice in section 1.5.6. There we calculated : the average of the result on the first flip times the total. Note that these two variables are not independent. If the first coin lands tails, you pay $1. That will effect the total amount of money that you make.
The trick is to find a set of variables that are independent. In this case that’s not too hard. Call the result of the second flip . The results for the first and second flip are independent. So now we express the total in terms of these independent variables: . So
(1.31) |
Now since can only take values , . In the second term we can use the independence of and to say . So
(1.32) |
as we obtained before.
1. Consider what happens when you throw two dice. You can ask the question ”what’s the probability of getting a total of 8 when the first one is 5?. Or in more generality: what’s the probability of getting a total of when the first one is ?. The answer is a function of two variables, and . You could call the result . How do you define averages in this case? For example, what’s the average value of .
You could determine by throwing the dice repeatedly taking the number of the first die and multiplying by the total. What you get should be equivalent to
(1.33) |
but that’s a sum over a lot of possibilities, all possible values of and . Instead use independence to solve this problem in analogy to what we just did for the two dice.
Now we’re going to analyze an important example a couple of different ways: what’s the average value for a variable distributed according to the binomial distribution?
Where does the binomial distribution come from? Flipping a coin times and saying that the probability of a head is , and a tail is . You can assign a variable to the ith flip, saying that if it lands tails, , you make nothing but if it lands heads, , that is you make $1. So the sum of the variables is just the number of times it landed heads. Earlier we called that sum , see eqn 1.14. We know the distribution P(X) is the binomial distribution, eqn. 1.9. So
(1.36) | |||
You can do this sum using some differentiation trick, but this is not the best way to find out the average. Let’s think about going back to our earlier approach.
First consider just one trial, say . You flip a coin. What’s the probability of it coming up heads? That’s just the definition of . So what’s the average value of ? It’s too, if you want to see this more mathematically just start with the definition of an average eqn. 1.16. In this case you have its . This is how much you expect to make in one trial. Now we’re interested in the total amount you make after trials. Well as we reasoned before, you can just multiply the result for one toss by , so the answer is . Saying this in mathematical notation:
(1.37) |
This uses the fact that the average of the sum is the sum of the average as we discovered before. So now we know the mean for the binomial distribution. It’s just (number of trials) (the mean for one trial).
Now some of you probably wish I’d done it the longer way using the calculus trick. Try it out if you’re so inclined, it’s not that bad. However it’s always best to try to come up with the most elegant and intuitive derivations of answers that you can. You understand the nature of the problem much better that way. Once you can do that you’ll have a shot and understanding and solving really hard problems, something that you won’t be able to do if you don’t follow this advice.