The flashcards below were created by user
floydbre
on FreezingBlue Flashcards.

Biserial correlation
a standardized measure of the strength of relationship between two variables when one of the two variables is dichotomous. The biserial correlation coefficient is used when one variable is a continuous dichotomy

Bivariate correlation
a correlation between two variables

Coefficient of determination
the proportion of variance in one variable explained by a second variable. It is the Person correlation coefficient squared.

Covariance
a measure of the 'average' relationship between two variables. It is the average crossproduct deviation

Crossproduct deviations
a measure of the 'total' relationship between two variables. It is the deviation of one variable from its mean multiplied by the other variable's deviation from its mean

Kendall's tau
a nonparametric correlation coefficient, but should be used in preference for a small data set with a large number of tied ranks

Partial corelation
a measure of the relationship between two variables while 'controlling' the effect of one or more additional variables has on both

Pearson correlation coefficient
or Pearson's productmoment correlation coefficient to give its full name, is a standardized measure of the strength of relationship between two variables. It can take any value from 1 (as one variable changes, the other changes in the opposite direction by the same amount), though 0 (as one variable changes the other doesn't change at all), to +1 (as one variable changes, the other changes in the same direction by the same amount).

Pointbiserial correlation
a standardized measue of strength of relationship between two variables when on of the two variables is dichotomous. The pointbiserial correlation coefficient is used when the dichotomy is discrete, or true, dichotomy. An example of this is pregnancy: you can be either pregnant or not, there is no in between

Semipartial correlation
a measure of the relationship between two variables whle 'controlling' the effect that one or more additional variables has on one of those variables. If we call our variables x and y, it gives us a measure of the variance in y that x alone shares

Spearman's correlation coefficient
a standardized measure of the strength of relationship between two variables that does not rely on the assumptions of a parametric test. It is Pearson's correlation coefficient performed on data that have been converted into ranked scores

Standardization
the process of converting a variable into a standard unit of measurement. The unit of measurement typically used is standard deviation units. Standardization allows us to compare data when differnt units of measurement have been used

_{i}
standardized regression coefficient. Indicates the strength of relationship between a given predictor, i, and an outcome in a standardized form. It is the change in the outcome associated with a one standard deviation change in the predictor

DFFit
a measure of the influence of a case. It is the difference between the adjusted predicted value of a particular case. If a case is not influential then its DFFit should be zero  hence, we expect noninfluential cases to have samll DFFit values. However, we have the problem that this statistic depends on the units of measurement of the outcome and so a DFFit of 0.5 will be very small if the outcome ranges from 1 to 100, but very large if the outcome varies from 0 to 1

Fratio
a test statistic with a known probability distribution. It is the ratio of the average variability in the data that a given model can explain to the average variability unexplained by that same model. It is used to test the overall fit of the model in simple regression and multiple regression, and to test for overall differences between group means in experiments.

Generalization
the ability of a statistical model to say something beyond the set of obsevations that spawned it. If a model generalized it is assumed that predictions from that model cam be applied not just to the sample on which it is based, but to a wider population from which the sample came.

Goodness of fit
an index of how well a model fits the data from which it was generated. It's usually based on how well the data predicted by the model correspond to the data that were actually collected

Heteroscedasticity
the opposite of homoscedasticity. This occurs when the residuals at eah level of the predictor variables have unequal variances. Put another way, at each point along any predictor variable, the spread of residuals is different

Hierarchical regression
a method of multiple regression in which the order in which predictors are entered into the regression model is determined by the researcher based on previous research: variables already known to be predictors are entered first, new variables are entered subsequently.

Homoscedasticity
an assumption in regression analysis that the residuals at each level of the predictor variables have similar variances

Independent errors
for any two observations in regression the residuals should be uncorrelated (or independent)

Mean squares
a measure of average variability.

Model sum of squares
a measure of the total amount of variability for which a model can account. It is the difference between the total sum of squares and the residual sum of squares

Multicollinearity
a situation in which two or more variables are very closely linearly related

Multiple R
the multiple correlation coefficient. it is the correlation between the observed values of an outcome and the values of the outcome predicted by a multiple regression model

Multiple regression
an extension of simple regression in which an outcome is predicted by a linear combination of two or more predictor variables

Outcome variable
a variable whose values we are trying to predict from one or more predictor variables

Perfect collinerity
exists when at least one predictor in a regression model is a perfect linear combination of the others

Predictor variable
a variable that is used to try to predict values of another variable known as an outcome variable

Residual
the difference between the value a model predicts and the value observed in the data on which the model is based

Residual sum of squares
a measure of the variability that cannot be explained by the model fitted to the data. It is the total squared deviance between the obsevations, and the value of those observation predicted by whatever model is fitteed to the data

Shrinkage
the loss of predictive power of a regression model if the model had been derived from the population from which the sample was taken, rather than the sample itself

Simple regression
a linear model in which one variable or outcome is predicted from a single predictor variable

Standardized residuals
the residuals of a model expressed in standard deviation units

Stepwise regression
a method of multiple regression in which variables are entered into the model based on a statistical criterion

Suppressor effects
when a predictor has a significant effect but only when another variable is held constant

tstatistics
student's t is a test statistic with a known probability distribution

Tolerance
tolerance statistics measure multicollinearity and are simply the reciprocal of the variance inflation factor (1/VIF)

Total sum of squares
a measure of the total variability within a set of observations

Unstandardized residuals
the residuals of a model is expressed in the units in which the original outcome variable was measured

Variance inflation factor (VIF)
a measure of multicollinearity. The VIF indicates whether a predictor has a strong linear relationship with the other predictor

