### Example: The Calculus of Fitting

The graph shows data on the monthly energy use by a house (measured in ``therms'') versus the heating-degree-days for each month.

What's the relationship between the two variables? A linear model is pretty reasonable here:

therms = m hdd.

What should m be? Introduce the idea of residuals and the criterion of picking the best m by minimizing the sum of squares of the residuals.

How you choose to do this might depend on the course you are teaching. Calculus? Perhaps write down the sum of square residuals:

&SigmaWith x^{n}_{i=1}( y_{i}- m x_{i})^{2}.

_{i}and y

_{i}known --- they are the data, after all --- this is a function of m. If you want to do classical calculus optimization, differentiate that function with respect to m and set the result to zero. That gives the formula for the best-fitting linear function in terms of &Sigma y

_{i}

^{2}, &Sigma x

_{i}

^{2}, and &Sigma x

_{i}y

_{i}.

**Statistics?** With the data in a spreadsheet, calculate the sum of square residuals explicitly for some given m and plot that point. Then try another m and so on. Gradually, a quadratic curve will emerge, reminding students about what they learned in algebra and calculus.

**Computer science?** Write a program to calculate the best fit.
Then, introduce the idea of random sampling with replacement and have
students construct a resampling

function --- a useful exercise in
indexing. By iterating the best fit to resampled data --- called
bootstrapping

in statistics --- your students can find the range
of m consistent with the data.

**Linear algebra?** Project the vector of y values down onto the
vector of x. That's what least-squares fitting is all about. The
coefficient m is an exercise in dot products.