a straight line that describes how a response variable y changes as an explanatory variable x changes. One variable explains or predicts the other.
May be used to predict the value of y for a given value of x.
Least-squares regression line:
the unique line such that the sum of the squared vertical
(y) distances between the data points and the line is the smallest possible.
Facts about least-squares regression:
1. The distinction between explanatory and response variables is essential in regression.
2. There is a close connection between correlation and the slope of the least-squares line.
3. The least-squares regression line always passes through the point ( x , y )
4. The correlation r describes the strength of a straight-line relationship. The square of the correlation, r2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
Equation of least-squares regression line:
Coefficient of determination, r2
r2: the fraction of the variance in y (vertical scatter from the regression line) that can be explained by changes in x.
dist. ( y - yˆ) = residual
Residuals are the distances between y-observed and y-predicted. We plot them in a residual plot.
If residuals are scattered randomly around 0, chances are your data fit a linear model, were normally distributed, and you didn’t have outliers.
The x-axis in a residual plot is the same as on the scatterplot.
The line on both plots is the regression line.
An observation that lies outside the overall pattern of observations.
An observation that markedly changes the regression if removed.
This is often an outlier on the x-axis.
The equation of the least-squares regression allows you to predict y for any x within the
range studied. This is called interpolating.
is a variable not included in the study design that does have an effect
on the variables studied.
It can falsely suggest a relationship.
Two variables are confounded when their effects on a response variable cannot be
distinguished from each other. The confounded variables may be either explanatory
variables or lurking variables.
is the use of a regression line for predictions outside the range of x values used to obtain the line.