Home > Preview
The flashcards below were created by user
tnrose87
on FreezingBlue Flashcards.

Backward Regression
Start with all predictors, remove the predictor with the smallest (k1)th order squared semipartial correlation (based on the number of predictors in the equation). If a significant change in R^{2} occurs, stop and use model with the last removed predictor.

Confounding Variable
When a variable affects two or more predictors, example age is a confounding variable when examining the relationship between math scores and height.

Semipartial Correlations (r_{1(2.3)})
The PPMCC between two variables from which the effects of one (or more) other variables have been removed (or partialled) from only one of the two variables. AKA The Uniqueness Index when squared (a predictor's unique contribution to R^{2})
The product with the largest Beta has the highest semipartial.

Spurious Correlation
a correlation between two variables that does not result from any direct relation between them but from their relation to other variables. E.g. Height and math scores  the correlation appears because of the third variable, age. But when age is controlled for, the correlation disappears.

Partial Correlation (r_{12.3})
The PPMCC between two variables from which the effects of one or more other variables have been removed (i.e. partialled) from the two variables. Tells you the magnitude of the correlation.
 Notation is symmetric: r_{12.3} = r_{21.3}
 ***Must take the square root from the SAS printout. ***

Suppressor Variable
A variable that is uncorrelated with one of the original variables, yet when it is partialled out, the apparent relationship between the two original variables increases.
E.g. Math computational scores vs. Math Word Problem Scores, reading is a suppressor variable because it affects the relationship between comp and word problem scores.

Detecting Multicollinearity
Tolerance: Lower values=higher multicollinearity (≤.2)
VIF: Higher values=higher multicollinearity (≥4)

Collinearity
When a predictor variable has a perfect or almost perfect relationship with other predictor variables, while ignoring the dependent variable.

Residual
 Actual Value  Predicted Value
 Also known as SS error

Stepwise Regression
Combination of Forward and Stepwise regression. Start with the single best predictor, add the best available predictor given what is already in the equation, if there is a significant change in R^{2}, remove only the noncontributing predictors. Predictors need to be significant to get into and stay in model.

Model Building
 Based on Theory
 The process to find the least complicated and best fitting model

R^{2}
SStotalSSerror/SStotal

Outliers: DFBETA
Indicates change in b if observation is deleted.
Cutoff is 2/√n

Cook's D
Indicates influence by taking into account both the size of the error and the leverage.
Should be <1

Outliers: Studentized Resdiuals
tvalue obtained by dividing error by its standard error
Cut off is >2

Regression Line
All of the y "hats" or prediceted y values fall on the line

SSerror
Residual SS or Unexplained SS
Deviation between observed and predicted value

SSmodel
 Explained SS or RegressionSS
 Deviation between predicted value and average of dependent value

SStotal
 Deviation between observed and average dependent value
 SStotal = Σ(yý)^{2}

Assumptions of LR
 *The true conditional probabilities are a logistic function of the independent variables
 *No important variables are ommitted
 *No extraneous variables are included
 *The independent variables are measured without error
 *The observations are independent
 *The independent variables are not linear combinations of each other

LR: Probablity
pi (range: 0 to 1)

Model Effect Size
 R^{2}_{L} = (2LL_{null})(2LL_{k})/(=2LL_{null})
 Explains proportion of null deviance accounted for by predictors

Model Fit
 χ2 = 2LL_{small}(2LL_{large}) < Also critical value
 df=#predictors_{large}#predictorss_{small}
 2LL also known as deviance or misfit
***Want to see a drop in deviance***

Predicted Probabilities
predicted log(odds)=.2+.5x _{1}+.1x _{2}
predicted odds = e ^{.2+.5x1+.1x2}
 P_{A}=1/1+e^{(.2+.5x1+.1x2)}
 Replace x_{1} and x_{2} w/ given values

LR: Odds Ratios
predicted log(odds)=.2+.5x_{1}+.1x_{2}
predicted odds = e^{.2+.5x1+.1x2}
(1OR)*100 = % Differences

LR: Log_{e}(Odds)
LN(Odds) (range: infinity to +infinity)

LR: Odds Ratio
Odds for Group A/Odds for Group B

LR: Odds
pi/1pi (range: 0 to +infinity)

Forward Regression
Seeking the predictor with the biggest correlation coefficient. Start w/ the single best predictor and add the best available predictor given what is already in the equation to achieve a significant change in R^{2}.

