Card Set Information

2015-11-10 20:37:16

Show Answers:

  1. What do good models do? Why is it important to have a good fitting model?
    To reduce total squared error
  2. We can quantify the fit of the model in terms of the deviations from the data (_), and the model prediction (_)
    (yi), (yhati)
  3. The deviations (y-yhat) are called the ___
    Model Residuals
  4. The sum of squared residuals is a ___
    • - Sum of the measure of model fit 
    • - smaller = better
    • - Convenient/intuitive measure of error of prediction
  5. Outcome/predictor?
    lm(formula=_~_, data=parenthood)
    dansleep + dangrump
  6. RSS =
    Σ(Yi -Yihat)2
  7. TSS=
    • Σ(Yi - Yibar)2
    • bar=mean
  8. Proportion of variance unexplained (Normalised least-squared error)
  9. Proportion of variance explained (coefficient of determination or R2)
  10. A model can be defined as (2)
    • 1) a set of parameters
    • 2) a rule for combining the parameters
  11. Model parameters can also be called: (2)
    They are chosen to:
    • 1) weights
    • 2) coefficients
    • minimize RSS
  12. If the model fits the data then the _ is small, or the _ is large
    RSS, coefficient of determination (R2)
  13. Steps in constructing a statistical test (3)
    • 1) Specify a null-hyp
    • 2) Identify a test-statistic of interest
    • 3 Determine the sampling distribution of test stat under assumption that null hyp is true (plus any other assumptions you have to make so it will work)
  14. Steps in applying the statistical test (4)
    • 1) Collect data
    • 2) calculate value of test stat
    • 3) Compare this value against relevant sampling distribution if null hyp is true
    • 4) If probability of observing at least this value is smaller than some criterion, reject null hyp
  15. The F distribution is handy because it:
    is used to test our null hype under the linear model
  16. Linear model formula:
    • yi= outcome
    • b0= coefficient
    • b1 = slope
    • x = predictor
    • e = residual/error
  17. "Additive" =…(2)
    • If 2+ UNCORRELATED predictors, total proportion of explained variance is additive
    • Additive models dont include any interaction
  18. "Sub-additive" = ...
    If 2+ CORRELATED predictors, total proportion of explained variance is sub-additive
  19. Partial Correlation:
    rY(X.Z) =
    CORRELATION between Y and X with the effect of Z removed from X
  20. rYX.Z=
    CORRELATION between X and Y with the effect of Z removed from both X and Y
  21. Dilution effect = (3)
    • Adding non-predicitve predictors reduces efficacy of model
    • But salient predictors remain salient
    • Suggests paying more attention to tests of individual coefficients
  22. Collinearity effect = (3)
    • Very high correlation btw 2 predictors INFLATES their standard errors
    • None of the coefficients of the correlated predictors may be significant
    • Suggests paying more attention to the tests of all the coefficients
  23. Variable Inflation Factor (VIF) (3)
    • Way of checking for (multi) collinearity
    • If regress a redundant predictor onto all other predictors, resulting R2 will be very high
    • Rule = abandon hope if VIF>10
  24. Forward Selection=
    Start with no predictors and add ones you think will work
  25. Backward selection=
    Start with all predictors and pull one ones that don't work
  26. Two attitudes towards data (4 and 4)
    • Planned:
    • - A priori
    • - Confirmatory
    • - Hypothesis testing
    • - Minimal/controlled capitalisation on chance

    • Unplanned:
    • - Post hoc
    • - Exploratory
    • - Prediction
    • - Maximal/uncontrolled capitalisation on chance
  27. Planned model building (3)
    • Regression model based on theoretical/practical context
    • Hypotheses determined by questions of interest
    • Mostly avoids capitalisation on chance
  28. Regression (linear model) assumptions: (5)
    • 1. Residuals have no discernible structure - outcome is modelled as a LINEAR function of the predictors (multiply predictors by a coefficient, add together -> "prediction")
    • 2. Residuals are independent (uncorrelated)
    • 3. Residuals are normally distributed (mean of 0, some kind of SD)
    • 4. Residuals have a constant variance (Homoscedasticity)
    • 5. No outliers (no residuals distorting results likely to find)
  29. Normal distribution of residuals assumption of linear regression can be tested/viewed using: (2)
    • Quantile Probability Plots
    • Shapiro-Wilk test (W closer to 1 = not normally distributed?)
  30. What to do about non-normal residuals? (3)
    • Ignore it
    • Transform 1 or more variables
    • Try another more complicated approach
  31. Homoscedasticity=
    • Population SD is same in both gps
    • Chi sq
    • P>.05=homosc is violated
  32. _ deals with factors, _ deals with numeric
    ANOVA deals with factors, MULTIPLE REGRESSION deals with numeric
  33. t-tests: (3)
    • Compare 2 means of an outcome variable
    • The 2 gps are defined by the levels of intervention: (No, Yes)
    • Null hyp = both means are the same
  34. Dummy (numeric) variable coding: Comparison between 2 gps (2)
    • Think of gps as CATEGORICAL variable (or "factor") and conduct t-test accordingly 
    • Think of gps as defining a numeric (dummy) predictor: 1 gp has 1 level of predictor, other gp has other level of predictor
  35. If equal variance is assumed, t-tests comparing 2 means of an outcome variable (H0:mu1=mu2) are equivalent to:
    • a test of the regression equation: y=b0+b1x+e
    • (x is a dummy variable coding for the group, and the null hyp is equivalent to H0:b1=0)
  36. Why called "one-way" ANOVA?
    • Only got one variable using to predict outcome, and multiple levels.
    • If two levels, its called t-test?
  37. Anova - Factorial Design: (4)
    • 2 or more factors are orthogonally (independent of each other) combined/crossed (aka fully crossed design)
    • Each CELL defined by choice of level across all factors (factors= Treatment & Expectations)
    • Allows effects of multiple factors to be estimated SIMULTANEOUSLY
    • Reduces residual error
    If each cell has SAME no of observations, its called balanced design
  39. eta-squared (n2) =
    • Effect size
    • Partial eta sq same as partial R2??????
  40. Interactions = (4)
    • Any departure from additive model = interaction
    • Effect of one factor not same at each level of other factor
    • Effect of one factor DEPENDS on level of other, effects of the two factors are NOT INDEPENDENT 
    • Interaction is an EFFECT: has a size + can be measured
  41. A set of factors are orthogonal if: (3)
    • They are fully crossed
    • There are equal number of observations in each cell
  42. If factors not orthogonal, they share some explained variance. Either: (2)
    • 1. The common variance is assigned to one of the correlated factors or
    • 2, The common variance is assigned to non of the correlated factors
    • (Anova takes option 1 - may not be appropriate)
  43. Type I SOS:(4)
    • Allocate order in which factors enter the model
    • Called "Sequential Sums of Squares/Type I"
    • SAME as forward selection
    • Method assumed by ANOVA
  44. Type II SOS: (2)
    • Allocate ONLY UNIQUE variance
    • Only works if NO interaction
  45. Type III SOS: (3)
    • Do not allocate ANY common variance to any factor/interaction
    • Works if SIGNIFICANT interaction
    • But main effects may be reduced
  46. 2 factors: treatment and expectations. Treatment has 3 levels, Expectations has 2 levels. This is called a __ anova
    3x2 anova
  47. Crossing the 2 factors creates a structure with __ cells
    6 (3x2)
  48. The mean of each cell is indexed by___
    the levels of the 2 factors
  49. F is a ratio of …
    How to calculate?
    • F is a ratio of mean squares
    • Divide its mean square by the residual
  50. Contrasts (3)
    • A planned comparison - (tests meaningful hypothesis)
    • A linear combination of predictors that sum to zero
    • Another way of specifying dummy variables
  51. Post hoc pairwise comparisons (4)
    • If you don't have any meaningful hypotheses can conduct set of these
    • EXPLORATORY rather than confirmatory
    • Proposed after meaningful hyps have been tested against data
    • Compare mean of each cell with mean of every other cell controlling for the FAMILYWISE ERROR RATE
  52. Type 1 error =
    reject hyp when its true
  53. Familywise error rate
    If you test k hypotheses, probability of making at least 1 type 1 error cannot be less than 1-(1-a)^k
  54. ANCOVA
    Hybrid form of multiple regression + ANOVA
  55. ANCOVA combines:
    • 1 or more CATEGORICAL factors (as dummy variables/contrasts) + 1 or more CONTINUOUS predictors (called covariates)
    • (Interest usually lies in the effects of the FACTORS on the DV)
  56. In ANCOVA, the predictors serve 2 pain purposes:
    • 1) To reduce residual error/variance
    • 2) To "control" for possible confounding effects of the covariate(s)
  57. In ANCOVA, it is desirable that the covariate be at least __ correlated with the DV, and at most __ correlated with the FACTOR of interest
    moderately, weakly
  58. The __ is based on the variance of the differences between conditions across participants. 
    This is the same as the __ between the __ and the __
    The paired samples t-test is based on the variance of the differences between conditions across participants.

    This is the same as the interaction between the factor (time) and the subjects variable (id).
  59. Assumptions of repeated measures (3)
    • 1. Independence of "subjects" (assume unrelated to each other)
    • 2. Normal distribution within each cell
    • 3. Sphericity
  60. Sphericity =
    Assumption that the VARIANCES of differences between each pair of within-subjects cells are EQUAL
  61. Sphericity can be tested using:
    • Mauchly test of sphericity
    • W=.0004, p<2.2e-16
    • = REJECT hypothesis that variances of differences are equal
  62. Why are the Greenhouse-Geisser and Huynh-Feldt corrections
    sometimes required by a repeated measures anova?
    • These corrections are applied to the degrees of freedom of an F-ratio
    • in order to adjust for failure of the sphericity assumption in repeated
    • measures anova.
  63. Diagrams serve __ and __ functions
    • Expository: explain/provide info
    • Productive: generate new info
  64. Graphs: (2)
    • Diagrams that exhibit relationship between 2 sets of numbers as a set of points having coordinates determined by the relationship (plots).
    • Used to illustrate relationships (charts)
  65. ggplot 2 package invokes the following terminology (4)
    • Aesthetics - maps data onto logical elements of graph
    • Geometrics - specifies how elements of graph are represented
    • Themes - Modifies look/feel of graph elements
    • Others