In class we are learning about linear regression, fitting a line through data, where the line does not have to pass through the data points precisely.
But we might wonder: Can we find a model for our data that does pass precisely through each point, and won't this model be more useful at predicting the future?
In the mean time, you might want to play around with linear regression on your own:
The following tool will allow you to play around with Lagrange Polynomials. Enter a list of 7 values into the table below, or click on a button to import prefetched data.
x
f(x)
1
2
3
4
5
6
7
The Lagrange Polynomial, below, passes through all the given data points.
Below is a table of values of the polynomial above, for various inputs, including inputs that correspond to past and future values!
Past
Entered Values
Future
-2
-1
0
1
2
3
4
5
6
7
8
9
10
Graph:
Questions to think about:
The Lagrange Polynomial models the data you input exactly. Based on your observations, though, does it make sense to try to use Lagrange Polynomials to try to predict the future?
To fit n data points, the Lagrange Polynomial is a polynomial of degree n-1. In the above case, n was 7. If you have 2 points, what would the Lagrange Polynomial look like?
In statistics, we try to find a balance between our model closely describing our observations and our model being simple. Think about the situation where our data points fall very close to a parabola.
Think about how you might argue that the best model is a parabola, not a line or Lagrange Polynomial.
Further reading/playing:
It may help to read about overfitting . Essentially, the Lagrange Polynomial is reading too much into the data.
By wanting our polynomial to fit our data exactly, we ignore other, simpler models that fit our data closely, but not precisely.