The flashcards below were created by user
Bernard117
on FreezingBlue Flashcards.

What is a a response variable?
measures or records an outcome of a study. Also called dependent variable

What is an Explanatory variable?
explains changes in the response variable (also called independent variable).

What is a scatterplot?
A scatterplot shows the relationship between two quantitative variables measured on the same individuals.
Typically, the explanatory or independent variable is plotted on the x axis, and the response or dependent variable is plotted on the y axis.
 Each individual in the data appears as a
 point in the plot.

How do you interpret a scatterplot?
 Examine:
 Form: linear, curved, clusters, no pattern
 Direction: positive, negative, no direction
 Strength: how closely the points fit the “form”

What is a positive association?
High values of one variable tend to occur together with high values of the other variable.

What is a negative association?
High values of one variable tend to occur together with low values of the other variable.

On a scatterplot, what if dots are horziontal?
 No relationship: X and Y vary independently.
 Knowing X tells you nothing about Y.

What is strength of the association?
The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form.
With a strong relationship, you can get a pretty good estimate of y if you know x
With a weak relationship, for any x you might get a wide range of y values.

What is the probability of occurrence of an outlier?
Low! falls outside the pattern of relationshp

What is the correlation coefficient r?
The correlation coefficient is a measure of the direction and strength of a linear relationship.
It is calculated using the mean and the standard deviation of both the x and y variables.
Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.

Between what two numbers is r between?
1 and +1

Correlation only describe...
Linear relationships. Will never describe curved relationships.

Since Correlations are calculated using means and SD are they resistant to outliers?
No.

What is a regression line?
A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.

What is the equation for the Regression Line?
Y= B _{0 }+ B _{1}X
 Y= Predicted response
 B1 is the slope
 B0 is the intercept

Coefficient of determination r^{2}. What does it represent?
r2 represents the percentage of the variance in y (vertical scatter from the regression line) that can be explained by changes in x.

What is Extrapolation?
Extrapolation is the use of a regression line for predictions outside the range of x values used to obtain the line.

Do all yintercepts make sense?
No, someone cannot have negative blood alcohol content.

What are residuals?
The distances from each point to the leastsquares regression line give us potentially useful information about the contribution of individual data points to the overall pattern of scatter.
See slide: 49 in chapter 2

Should we plot our data before running a regression analysis?
Yes. because you want to know if its linear and if it has outlers. These can make regression lines meaningless or misleading.

What is a lurking Variable?
A lurking variable is a variable not included in the study design that does have an effect on the variables studied.
Lurking variables can falsely suggest a relationship.
The lab we did, my example is movies and its options

What is a confounding variable?
Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables.
So for example, testing commercials. Length may not be a problem, longer is fine.
How many times they watch it may not be a problem
but when both are high, this can cause annoyance.

What is a two way table?
Two way tables describes the relationship between two categorical variables.

What is marginal Distributions?
If we want to look at a single variable in isolation we can look at the distribution of the numbers in the total column or total row
These are called the marginal distributions Because they appear in the right and bottom margins of the table

What is conditional distribution?
If we isolate a certain value of one variable and look at the distribution of another variable, then that is called the conditional distribution
 For example, if we look only at single men, and then look at the distribution of job grade then we are looking at “Under
 the condition that someone is single, what is the distribution of job grade….”

What is Simpson's Paradox?
An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson's Paradox

