You see that as the number of trials is increased, this distribution sharpens up more and more. We’ll quantify how this happens a bit later on. But this property is quite important and is really the whole basis why statistics work.
Suppose you want to know what’s the probability a coin will come up heads or tails. You could guess it was but maybe it’s a trick coin. You really want to do experiments to figure it out. If you toss it twice, you might get a head and a tail. So that means the probability is or does it? Someone else might toss the same coin and this time get two heads. So does that mean the probability of a head is ? Not at all. You’d have to repeat the experiment with many more trials to get an accurate estimate. estimate. Even after 50 trials, you’re not going to get exactly , the distribution of results is the one shown above in figure 1.3.9. You’re pretty likely to get a probability of or . The more you repeat the experiment, the more you can precisely determine the probability of getting a head, because at least on this scale, the distribution is getting sharper.
But it’s good that it gets sharper. That means that the (total number of heads you see)/(total number of trials) becomes more and more well defined, in this case . That’s the reason for why statistics actually work and people try to do experiments with large amounts of data. One data point doesn’t tell you much, but data points is likely to tell you something.