You see that as the number of trials is increased, this distribution sharpens up more and more. We’ll quantify how this happens a bit later on. But this property is quite important and is really the whole basis why statistics work.

Suppose you want to know what’s the probability a coin will come up heads or tails. You could guess it was $\colorbox[rgb]{1,1,1}{$.5$}$ but maybe it’s a trick coin. You really want to do experiments to figure it out. If you toss it twice, you might get a head and a tail. So that means the probability is $\colorbox[rgb]{1,1,1}{$.5$}$ or does it? Someone else might toss the same coin and this time get two heads. So does that mean the probability of a head is $\colorbox[rgb]{1,1,1}{$1$}$? Not at all. You’d have to repeat the experiment with many more trials to get an accurate estimate. estimate. Even after 50 trials, you’re not going to get exactly $\colorbox[rgb]{1,1,1}{$.5$}$, the distribution of results is the one shown above in figure 1.3.9. You’re pretty likely to get a probability of $\colorbox[rgb]{1,1,1}{$.46$}$ or $\colorbox[rgb]{1,1,1}{$.52$}$. The more you repeat the experiment, the more you can precisely determine the probability of getting a head, because at least on this scale, the distribution is getting sharper.

But it’s good that it gets sharper. That means that the (total number of heads you see)/(total number of trials) becomes more and more well defined, in this case $\colorbox[rgb]{1,1,1}{$.5$}$. That’s the reason for why statistics actually work and people try to do experiments with large amounts of data. One data point doesn’t tell you much, but ${\colorbox[rgb]{1,1,1}{$10$}}^{\colorbox[rgb]{1,1,1}{$6$}}$ data points is likely to tell you something.