The flashcards below were created by user
on FreezingBlue Flashcards.
What is a a response variable?
measures or records an outcome of a study. Also called dependent variable
What is an Explanatory variable?
explains changes in the response variable (also called independent variable).
What is a scatterplot?
A scatterplot shows the relationship between two quantitative variables measured on the same individuals.
Typically, the explanatory or independent variable is plotted on the x axis, and the response or dependent variable is plotted on the y axis.
- Each individual in the data appears as a
- point in the plot.
How do you interpret a scatterplot?
- Form: linear, curved, clusters, no pattern
- Direction: positive, negative, no direction
- Strength: how closely the points fit the “form”
What is a positive association?
High values of one variable tend to occur together with high values of the other variable.
What is a negative association?
High values of one variable tend to occur together with low values of the other variable.
On a scatterplot, what if dots are horziontal?
- No relationship: X and Y vary independently.
- Knowing X tells you nothing about Y.
What is strength of the association?
The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form.
With a strong relationship, you can get a pretty good estimate of y if you know x
With a weak relationship, for any x you might get a wide range of y values.
What is the probability of occurrence of an outlier?
Low! falls outside the pattern of relationshp
What is the correlation coefficient r?
The correlation coefficient is a measure of the direction and strength of a linear relationship.
It is calculated using the mean and the standard deviation of both the x and y variables.
Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.
Between what two numbers is r between?
-1 and +1
Correlation only describe...
Linear relationships. Will never describe curved relationships.
Since Correlations are calculated using means and SD are they resistant to outliers?
What is a regression line?
A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.
What is the equation for the Regression Line?
- Y= Predicted response
- B1 is the slope
- B0 is the intercept
Coefficient of determination r2. What does it represent?
r2 represents the percentage of the variance in y (vertical scatter from the regression line) that can be explained by changes in x.
What is Extrapolation?
Extrapolation is the use of a regression line for predictions outside the range of x values used to obtain the line.
Do all y-intercepts make sense?
No, someone cannot have negative blood alcohol content.
What are residuals?
The distances from each point to the least-squares regression line give us potentially useful information about the contribution of individual data points to the overall pattern of scatter.
See slide: 49 in chapter 2
Should we plot our data before running a regression analysis?
Yes. because you want to know if its linear and if it has outlers. These can make regression lines meaningless or misleading.
What is a lurking Variable?
A lurking variable is a variable not included in the study design that does have an effect on the variables studied.
Lurking variables can falsely suggest a relationship.
The lab we did, my example is movies and its options
What is a confounding variable?
Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables.
So for example, testing commercials. Length may not be a problem, longer is fine.
How many times they watch it may not be a problem
but when both are high, this can cause annoyance.
What is a two way table?
Two way tables describes the relationship between two categorical variables.
What is marginal Distributions?
If we want to look at a single variable in isolation we can look at the distribution of the numbers in the total column or total row
These are called the marginal distributions Because they appear in the right and bottom margins of the table
What is conditional distribution?
If we isolate a certain value of one variable and look at the distribution of another variable, then that is called the conditional distribution
- For example, if we look only at single men, and then look at the distribution of job grade then we are looking at “Under
- the condition that someone is single, what is the distribution of job grade….”
What is Simpson's Paradox?
An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson's Paradox