In many ways, averaging is the most important application of probability. People think in terms of averages. ”What was the average on the midterm”, ”his batting average is .302”. But ”Bob is an average Joe” is not a place we want to go.

There are many different types of averaging. Mode and median are two kinds you may have heard of, but here we’ll concentrate on taken the arithmetic mean. For the moment, we’ll use average and mean synonymously to denote summing up the data points and dividing by the total number. The distinction between different kinds of averaging will become crucial when we discuss statistics.

So let’s talk about how this works. Bob could average the blades of grass that he measured. You just add up all the data for their lengths and divide by the number of data points. In more mathematical notation, if you have $\colorbox[rgb]{1,1,1}{$N$}$ data points ${\colorbox[rgb]{1,1,1}{$x$}}_{\colorbox[rgb]{1,1,1}{$1$}}\colorbox[rgb]{1,1,1}{$,$}{\colorbox[rgb]{1,1,1}{$x$}}_{\colorbox[rgb]{1,1,1}{$2$}}\colorbox[rgb]{1,1,1}{$,$}\colorbox[rgb]{1,1,1}{$\mathrm{\dots}$}\colorbox[rgb]{1,1,1}{$,$}{\colorbox[rgb]{1,1,1}{$x$}}_{\colorbox[rgb]{1,1,1}{$N$}}$ then the average is defined as

$$\colorbox[rgb]{1,1,1}{$\u27e8$}\colorbox[rgb]{1,1,1}{$x$}\colorbox[rgb]{1,1,1}{$\u27e9$}\colorbox[rgb]{1,1,1}{$=$}\frac{\colorbox[rgb]{1,1,1}{$1$}}{\colorbox[rgb]{1,1,1}{$N$}}\colorbox[rgb]{1,1,1}{$\sum $}_{\colorbox[rgb]{1,1,1}{$i$}\colorbox[rgb]{1,1,1}{$=$}\colorbox[rgb]{1,1,1}{$1$}}^{\colorbox[rgb]{1,1,1}{$N$}}{\colorbox[rgb]{1,1,1}{$x$}}_{\colorbox[rgb]{1,1,1}{$i$}}$$ | (1.13) |

This notation $\colorbox[rgb]{1,1,1}{$\u27e8$}\colorbox[rgb]{1,1,1}{$x$}\colorbox[rgb]{1,1,1}{$\u27e9$}$ is one common notation for an average. Another is $\overline{\colorbox[rgb]{1,1,1}{$x$}}$. I find the $\overline{\colorbox[rgb]{1,1,1}{$x$}}$ notation more confusing when writing equations with bars on top of bars so I’ll use it less often.

If you performed this sort of average on Bob’s data you’d get something fairly close, but not exactly $\colorbox[rgb]{1,1,1}{$6$}\colorbox[rgb]{1,1,1}{$c$}\colorbox[rgb]{1,1,1}{$m$}$. With just 5 data points, you might get 6.29, with 100 you might get 5.9, and with 10000 you might get 6.01.

This leads to a subtle but important distinction between this kind of averaging and an ”expectation value”. I’m not going dwell on the precise definition of that term, but explain how it differs from just taking the kind of average we did above in eqn. 1.13. An expectation value is an extreme form of averaging. It’s what you’d get if you averaged over a virtually infinite number of blades of grass. In this case it’d be 6. This is in some sense, the true value of the average for a population. The difference between this value and the averaging described above, is that there are deviations away from the true value. This is inherent in anything involving probability. With a finite number of measurements you’re never going to be able to determine anything precisely.