Linear Regression

Say in the example of above in section 1.8, you want to actually predict the height of a Swedish man, knowing his weight. The simplest way to do this is to try to find a linear relationship between his weight $x$, and height $y$. We write this in the standard form $y = ax + b$. The slope is $a$ and the intercept with the y-axis is $b$.

\begin{figure}\begin{center}
\epsfig{width=.4\textwidth,file=WeightHeightLine.eps}
\end{center}\end{figure}

The red line is a best fit to the data. There are many ways of doing this, but the most common is called "linear regression" You'll never get a perfect fit, they're be errors for each data point $i$ that are the difference between the true value you measured $y_i$ and the line $ax_i + b$. So the error for the ith point is $\epsilon_i = y_i - (ax_i + b)$.

You try to find the values of $a$ and $b$ that best fit the data by minimizing a measure of the error. This is normally done by taking the measure of the total error to be the sum of the squares of the individual errors (technically these individual errors are actually called residuals):

\begin{displaymath}
\sum_i \epsilon_i^2 = (y_i - (ax_i + b))^2
\end{displaymath} (1.64)

You minimize this with respect to a and b and get
\begin{displaymath}
a = C(x,y)/var(x)
\end{displaymath} (1.65)

You get $b$ by averaging $y = ax + b$: $\langle y\rangle = a\langle x\rangle + b$ so
\begin{displaymath}
b = \langle y \rangle - a\langle x\rangle
\end{displaymath} (1.66)

Here's a nice applet demonstrating these concepts.

Josh Deutsch 2009-03-05