The following tool will allow you to generate bivariate data sets and compute linear regressions.
To generate your data set: (or skip and just press "Generate!")
  1. Choose an N (number of data points). Designed with values less than or equal to 42 in mind. Values larger than N=100 may cause noticeable delay.
  2. Choose a method of generating the x coordinates:
  3. Choose coefficients for generating the y coordinates.
  4. Press "Generate!"
  5. Press "Generate!" again to rerandomize.
  6. Was this helpful? Confusing? Didn't work correctly in your browser? Feedback to thomas.kern@oswego.edu is highly appreciated.

N: Choose how you would like to generate your data:
Generate x: Evenly Spaced
Random (Uniform Distribution)
Random (Normal Distribution)

Mean:

St.Dev.:

Generate y: As a function of x:
x2 + x +
Plus a random displacement:
St.Dev.:
Hint: Leave entries as 0 to not include them.
μx

μy

σx

σy

ρ

ρ2

Linear:



Automatically fit data in window
Maintain aspect ratio
Draw axes
Draw regression line
Draw residual lines
Min x: Max x:
Min y: Max y:


Residuals:

Questions to think about:
  1. Set the coefficient of x2 under "generate y as a function of x" to 0, and the standard deviation under "plus a random displacement" to 1. How is the linear regression found related to the "generate y as a function of x" line? Try generating new data several times and adjusting the coefficients under "generate y as a function of x".
  2. Set the coefficients of x2 and x under "generate y as a function of x" to 0, and the standard deviation under "plus a random displacement" to a nonzero value.
    1. Press "Generate!" a bunch of times. How frequently does the tool detect a linear relationship despite x and y being generated completely independently?
    2. Use the tool below to streamline the process:
      Number of iterations:

      Linear relations found:

    3. These misclassifications are called "false positives" or "type I errors" and occur naturally and unavoidably in the course of studying statistics. We will talk about them later on in class. For now you can read more about them below.
    4. What happens to the graph as you change the standard deviation under "plus a random displacement"?
  3. Set the method for generating x to be "evenly spaced" and the standard devation under "generate y: plus a random displacement" to 0. Set the coefficient of x2 under "generate y as a function of x" to be 1. By playing around with the coefficient of x, the constant term, and upper and lower bounds for x can you get ρ to be close to 0? close to 1? close to -1?
  4. Set the coefficient of x2 under "generate y as a function of x" to 0. Play around with the coefficient of x and the standard deviation of the random displacement to see if you can find patterns about when you are likely to get linear relationships.
  5. Set the "generate y as a function of x" to 1 x2 + 20 x + 1, and the standard deviation of the "plus a random displacement" to 0. How good is the linear regression model? You may wish to uncheck "maintain aspect ratio" to get a better plot. Compare the linear regression model versus the quadratic model y = x2 + 20 x + 1 which would be perfect. What do the residuals look like?
Further reading/playing: