The flashcards below were created by user
bpulsipher
on FreezingBlue Flashcards.

Define bivariate data:
data in which two variables are measured on an individual.
(e.g., we might want to know whether the amount of cola consumed per week is related to a person's bone density. The individuals would be the people in the study, and the two variables would be the amount of cola consumed weekly and bone density.)
Before we can represent bivariate data graphically, we must decide which variable will be used to predict the value of the other variable.

What is the response variable?
Dependent variable; is the variable whose value can be explained by the value of the explanatory (predictor or independent) variable.
(dependent on the independent variable)

What is a scatter diagram? What are it's axes?
a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram.
The explanatory variable is plotted on the horizontal (x) axis, and the response variable is plotted on the vertical (y) axis.

How can one tell when two variables have a positive association? A negative association?
two variables are positively associated if, whenever the value of one variable increases, the value of the other variable also increases (directly related).
two variables are negatively associated if, whenever the value of one variable increases, the value of the other variable decreases (inversely related).

Just as we can __________ the scale of graphs of univariate data, we can also __________ the scale of graphs of _________ data, possibly resulting in _________ conclusions.
Therefore, _________ summaries of _________ data should be used __ _________ __ graphs to determine ___ ________ that exists between two variables.
manipulate, manipulate, bivariate, incorrect
numerical, bivariate, in addition to, any relation

What is the linear correlation coefficient? The formula?
a measure of the strength and direction of the linear relation between two quantitative variables.
is the sample stand. dev. for the explanatory (independent) variable
is the sample mean of the response (dependent) variable
is the sampel stand. dev. for the response (dependent) variable
is the number of individuals in the sample

what is the value of a linear correlation coefficient for a straight line with a positive slope? with a negative slope?
Does a correlation coefficient close to "zero" imply that there is no relation. Why or why not?
roughly 1.0 (positive), roughly 1.0 (negative)
 No, it just implies that there is no linear
 relation.
the closer the data are to a correlation of 1 or 1, the closer the data values are to falling on a perfectly straight line.

What does a quadratic (Ushaped) relation on a scatter plot mean? What value would a quadratic (Ushaped) relation likely be close to? Does this mean the values on a scatter plot with a quadratic relation have no relation?
It means that the data lie in a "U" shape on the scatter plot.
Zero.
No, it means that the values may have a relation but have little or no linear relation.

T or F
The correlation coefficient only applies to data that is linearly related.
True

T or F
The linear correlation coefficient is a resistant measure of linear association.
False, the linear correlation coefficient is not a resistant measure of linear association

List the three steps for testing a linear relation:
(1) Determine the absolute value of the correlation coefficient.
(2) Find the critical value in the data set for the given sample size.
(3) If the absolute value of the correlation coefficient (r) is greater than the critical value, we say that a linear relation exists between the two variables. Otherwise, no linear relation exists.

T or F (ch.1 recap)
If data used in a study are observational, we can conclude that the two correlated variables have a causal relationship.
F, we cannot conclude that the two correlated variables have a causal relationship.

Is there another way two variables can be correlated without a causal relationship existing? If not or if so, explain.
Yes, through a lurking variable.
Recall that a lurking variable is an explanatory variable that was not considered in the study, but affects the response variable.
Example: an increase in ice cream sales at the beach causes an increase in shark attacks. Not necessarily. The lurking variable may be higher temperatures in the summer. This may cause more people to go to the beach to cool down (where there are sharks in the water) and eat ice cream while there.

T or F
Correlation implies causation.
False, correlation does not imply causation

4.2
What is the residual for an observation?
The difference between the observed value of y and predicted value of y. This is the error, or residual. The smaller the residual, the better the prediction.
It can also be thought of as the distance between the vertical line (y=mx+b) and points that fall along or near that line on a scatter plot.

4.2
What is the leastsquares regression line?
It is a line that minimizes the sum of the squared residuals (or errors or vertical distance) between the observed values of y and those predicted by the line, (yhat). We represent this as
this is the line that best describes the relation between two variables and it is based on residuals (errors).

4.2
T or F
The advantage of the leastsquares criterion is that it does not allow for statistical inference on the predicted value and slope.
False, it does allow for statistical inference on the predicted value and slope.

4.2
What is the formula for the leastsquares regression line? What is the formula for the slope of the leastsquares regression line? What is the formula for the yintercept of this line?
since is the slope of this line
since is the yintercept
r is the linear coefficient
is the sample mean and is the sample st. dev. of the independent (explanatory) variable.
is the sample mean and is the sample st. dev. of the dependent (response) variable

4.2
T or F
The leastsquares regression line always contains the point
True

4.2
T or F
Because and must both be negative, the sign of the linear correlation coefficient, r, and the sign of the slope of the leastsquares regression line, , are never the same.
False, Because s sub X and s sub y must both be positive, the sign of the linear correlation coefficient, r, and the sign of the slope of the leastsquares line, b sub 1, are the same.

4.2
T or F
The predicted value of y, , has an interesting interpretation. It is an estimate of the mean value of the explanatory variable for any value of the explanatory variable.
False, response

4.2
We round the (slope/xintercept) to (four/three/two/one) decimal places. We round predictions to (one/two/three/none) more decimal place(s) than the raw data of the response (dependent) variable. This is the (same/different) rounding rule we use for the mean.
slope, four, one, same

4.2
What is the difference in interpreting the slope for a normal regression line (y=mx+b) and the slope of the leastsquares regression line ?
Interpreting slope for leastsquares regression lines has a minor twist. Statistical models such as a leastsquares regression equation are probabilistic. This means that any predictions or interpretations made as a result of the model are based on uncertainty.
Therefore, when we interpret the slope of a leastsquares regression equation, we do not want to imply that there is 100% certainty behind the interpretation.
This is because we are talking about averages with the leastsquares regression equation. The algebraic regression line y=mx+b does is not average based and can therefore say that if one variable (x) increases by one (or more), then the y variable will increase or decrease by (whatever number the slope indicates). This cannot be said with the leastsquares regression line.
With the leastsquares regression line, a change in x can mean a change in y only "on average" or as "expected".

4.2
The yintercept on a graph is the point where...
the graph intersects with the vertical axis.

4.2
When interpreting the yintercept:
In general, we interpret a yintercept as being the value of the response variable when...
the value of the explanatory variable is "zero". It is found by letting "x" equal zero and solving for y.

4.2
When interpreting the yintercept, there are two conditions. The first:
In order to interpret the yintercept, we must first ask two questions. What are they?
(1) Is 0 a reasonable value for the explanatory variable?
(2) Do any observations near x=0 exist in the data set?
If the answer to either question is "no", then we do not interpret (or find) the yintercept.

4.2
When interpreting the yintercept, there are two conditions. The second:
We should not use the regression model to make predictions outside the scope of the model.
This means that we should not use the regression model to make predictions for values of the explanatory variable that are much larger or much smaller than those observed.
This is a dangerous practice because we cannot be certain of the behavior of data for which we have no observations.

4.2
When the correlation coefficient indicates (some kind of a/no) linear relation between the ___________ and ________ variables and the scatter diagram indicates (a/no) relation between the variables, then we use the (median/mean/mode) value of the response variable as the _________ value so that .
no, explanatory (independent), response (dependent), no, mean, predicted

4.2
Recall that the leastsquares regression line minimizes the sum of the squared residuals. This means that...
the sum of the squared residuals, , is smaller for the leastsquares line than for any other line that may describe the relation between the two variables.

4.2
T or F
When the correlation coefficient indicates no linear relation between the explanatory and response variables and the scatter diagram indicates no relation between the variables, we use the median value of the of the response variable as the predicted value so the
False, mean value

4.3
What does does the coefficient of determination, R^{2}, measure?
R^{2} measures the percentage of total variation in the response variable that is explained by the leastsquares regression line.

4.3
The coefficient of determination is a number between 0 and 1, inclusive. What does it mean if R^{2} = 0? If R^{2} = 1?
If R^{2} = 0, the leastsquares regression line has no explanatory value.
If R^{2} = 1, the leastsquares regression line explains 100% of the variance in the response variable.

What is deviation?
The difference between predicted values and the actual values. This is due to factors other than the the explanatory variable and random error.

What is total deviation? Explained deviation? Unexplained deviation? Give the formula for calculating each and for the total devation.
The deviation between the observed value, y, and mean value, , of the response variable.
The deviation between the predicted value, , and mean value, , of the response variable.
The deviation between the observed value, y, and predicted value, , of the response variable.
total deviation = unexplained deviation + explained deviation

4.3
The (closer/farther) the observed y's are to the regression line (the predicted y's), the (larger/smaller) R^{2} will be.
How can this be described in mathematical terms?
closer, larger
It makes sense then that then farther the observed y's are to the regression line (the predictes y's), the smaller R ^{2} will be.

4.3
Unexplained variation is found by _______ the _______ of the _________, [formula].
summing, squares, residuals,
A large value (close to 1 or 100%) of R ^{2} implies that the unexplained variation is a small portion of the total variation. Conversely, a large value of R ^{2} implies that explained variation is a large portion of the total variation.
Remember that this coefficient is based on the variation of the response variable, not the explanatory variable.

4.3
T or F
Squaring the linear correlation coefficient to obtain the coefficient of determination works only for the leastsquares regression model with more than one explanatory variable.
False, with one explanatory variable

4.3
What are the three purposes for which we analyze residuals?
(1) To determine whether a linear model is appropriate to describe the relation between the explanatory and response variables.
(2) To determine whether the variance of the residuals is constant.
(3) To check for outliers.

4.3
What is a residual plot? What is the purpose of a residual plot? What is unexplained variation?
A "scatter plot" where the explanatory variable is plotted on the (x) horizontal axis and the corresponding residual is on the (y) vertical axis.
The purpose of a residual plot is to analyze the unexplained variation present in a data sample (not the explained or total variation).
Unexplained variation (SSE  sum of squares error, or SSR  sum of squared residuals) is a measure of the difference between the observed value, , and the predicted value, .

4.3
T or F
If a plot of the residuals against the explanatory variable shows a discernable pattern, such as a curve, the response and explanatory variable may be linearly related.
False, may not be related

4.3
It is important for the residuals to have constant error variance because...
if data is used to make predictions, we want all of our values of the explanatory variable to maintain constant confidence in our predictions rather than fluctuate in some way or at one or more points.

4.3
What is an influential observation?
An observation for a statistical calculation whose deletion from the dataset would noticeably change the result of the calculation.
It significantly affects the leastsquares regression line's slope and/or yintercept. (It also affects the value of the correlation coefficient.)
If the value of the correlation coefficient (r) changes significantly, but the slope and the yintercept do not, then do not consider the point influential.

4.3
What the relative vertical position of an observation?
Residual

4.3
What do we call the relative horizontal position of an observation?
Leverage
Leverage is a measure that depends on how much the observation's value of the explanatory variable differs from the mean value of the explanatory variable.

4.3
T or F
As with outliers, influential observations should never be removed.
False, as with outliers, influential observations should be removed only if there is justification to do so.
When an influential observation occurs in a data set and its removal is not warranted, two possible courses of action are to (1) collect more data so that additional points near the influential observation are obtained or (2) use techniques that reduce the influence of the influential observation.

Which of the following conditions below might indicate that a linear model would not be appropriate?
(a) constant error variance
(b) patterned residuals
(c) none
(d) outlier
Residuals are ________ to _________ whether a linear model is ___________ to describe the ________ between the ___________ and ________ variables, to determine whether the ________ of the _________ is ________, and to check for ________.
To determine if a ______ model is ___________, a ________ plot is used, which is a _______ _______ with the _________ on the ________ axis and the ___________ variable on the __________ axis.
none
analyzed, determine, appropriate, relation, explanatory (independent), response (dependent), variance, residuals, constant, outliers
linear, appropriate, residual, scatter diagram, residuals, vertical (Y), explanatory (independent), horizontal (X)

The proportion of the variability explained by the relation between the explanatory (independent) and response (dependent) variable is measured by __.
R^{2}

