# DRIP

The flashcards below were created by user peep_muri on FreezingBlue Flashcards.

1. What do good models do? Why is it important to have a good fitting model?
To reduce total squared error
2. We can quantify the fit of the model in terms of the deviations from the data (_), and the model prediction (_)
(yi), (yhati)
3. The deviations (y-yhat) are called the ___
Model Residuals
4. The sum of squared residuals is a ___
• - Sum of the measure of model fit
• - smaller = better
• - Convenient/intuitive measure of error of prediction
5. Outcome/predictor?
lm(formula=_~_, data=parenthood)
dansleep + dangrump
dangrump~dansleep
Σ(Yi -Yihat)2
7. TSS=
• Σ(Yi - Yibar)2
• bar=mean
8. Proportion of variance unexplained (Normalised least-squared error)
9. Proportion of variance explained (coefficient of determination or R2)
10. A model can be defined as (2)
• 1) a set of parameters
• 2) a rule for combining the parameters
11. Model parameters can also be called: (2)
They are chosen to:
• 1) weights
• 2) coefficients
12. If the model fits the data then the _ is small, or the _ is large
13. Steps in constructing a statistical test (3)
• 1) Specify a null-hyp
• 2) Identify a test-statistic of interest
• 3 Determine the sampling distribution of test stat under assumption that null hyp is true (plus any other assumptions you have to make so it will work)
14. Steps in applying the statistical test (4)
• 1) Collect data
• 2) calculate value of test stat
• 3) Compare this value against relevant sampling distribution if null hyp is true
• 4) If probability of observing at least this value is smaller than some criterion, reject null hyp
15. The F distribution is handy because it:
is used to test our null hype under the linear model
16. Linear model formula:
yi=b0+b1x+e
• yi= outcome
• b0= coefficient
• b1 = slope
• x = predictor
• e = residual/error
• If 2+ UNCORRELATED predictors, total proportion of explained variance is additive
• Additive models dont include any interaction
If 2+ CORRELATED predictors, total proportion of explained variance is sub-additive
19. Partial Correlation:
rY(X.Z) =
CORRELATION between Y and X with the effect of Z removed from X
20. rYX.Z=
CORRELATION between X and Y with the effect of Z removed from both X and Y
21. Dilution effect = (3)
• Adding non-predicitve predictors reduces efficacy of model
• But salient predictors remain salient
• Suggests paying more attention to tests of individual coefficients
22. Collinearity effect = (3)
• Very high correlation btw 2 predictors INFLATES their standard errors
• None of the coefficients of the correlated predictors may be significant
• Suggests paying more attention to the tests of all the coefficients
23. Variable Inflation Factor (VIF) (3)
• Way of checking for (multi) collinearity
• If regress a redundant predictor onto all other predictors, resulting R2 will be very high
• Rule = abandon hope if VIF>10
24. Forward Selection=
25. Backward selection=
26. Two attitudes towards data (4 and 4)
• Planned:
• - A priori
• - Confirmatory
• - Hypothesis testing
• - Minimal/controlled capitalisation on chance

• Unplanned:
• - Post hoc
• - Exploratory
• - Prediction
• - Maximal/uncontrolled capitalisation on chance
27. Planned model building (3)
• Regression model based on theoretical/practical context
• Hypotheses determined by questions of interest
• Mostly avoids capitalisation on chance
28. Regression (linear model) assumptions: (5)
• 1. Residuals have no discernible structure - outcome is modelled as a LINEAR function of the predictors (multiply predictors by a coefficient, add together -> "prediction")
• 2. Residuals are independent (uncorrelated)
• 3. Residuals are normally distributed (mean of 0, some kind of SD)
• 4. Residuals have a constant variance (Homoscedasticity)
• 5. No outliers (no residuals distorting results likely to find)
29. Normal distribution of residuals assumption of linear regression can be tested/viewed using: (2)
• Quantile Probability Plots
• Shapiro-Wilk test (W closer to 1 = not normally distributed?)
30. What to do about non-normal residuals? (3)
• Ignore it
• Transform 1 or more variables
• Try another more complicated approach
31. Homoscedasticity=
• Population SD is same in both gps
• Chi sq
• P>.05=homosc is violated
32. _ deals with factors, _ deals with numeric
ANOVA deals with factors, MULTIPLE REGRESSION deals with numeric
33. t-tests: (3)
• Compare 2 means of an outcome variable
• The 2 gps are defined by the levels of intervention: (No, Yes)
• Null hyp = both means are the same
34. Dummy (numeric) variable coding: Comparison between 2 gps (2)
• Think of gps as CATEGORICAL variable (or "factor") and conduct t-test accordingly
• Think of gps as defining a numeric (dummy) predictor: 1 gp has 1 level of predictor, other gp has other level of predictor
35. If equal variance is assumed, t-tests comparing 2 means of an outcome variable (H0:mu1=mu2) are equivalent to:
• a test of the regression equation: y=b0+b1x+e
• (x is a dummy variable coding for the group, and the null hyp is equivalent to H0:b1=0)
36. Why called "one-way" ANOVA?
• Only got one variable using to predict outcome, and multiple levels.
• If two levels, its called t-test?
37. Anova - Factorial Design: (4)
• 2 or more factors are orthogonally (independent of each other) combined/crossed (aka fully crossed design)
• Each CELL defined by choice of level across all factors (factors= Treatment & Expectations)
• Allows effects of multiple factors to be estimated SIMULTANEOUSLY
• Reduces residual error
38. BALANCED DESIGN =
If each cell has SAME no of observations, its called balanced design
39. eta-squared (n2) =
• Effect size
• Partial eta sq same as partial R2??????
40. Interactions = (4)
• Any departure from additive model = interaction
• Effect of one factor not same at each level of other factor
• Effect of one factor DEPENDS on level of other, effects of the two factors are NOT INDEPENDENT
• Interaction is an EFFECT: has a size + can be measured
41. A set of factors are orthogonal if: (3)
• They are fully crossed
• There are equal number of observations in each cell
42. If factors not orthogonal, they share some explained variance. Either: (2)
• 1. The common variance is assigned to one of the correlated factors or
• 2, The common variance is assigned to non of the correlated factors
• (Anova takes option 1 - may not be appropriate)
43. Type I SOS:(4)
• Allocate order in which factors enter the model
• Called "Sequential Sums of Squares/Type I"
• SAME as forward selection
• Method assumed by ANOVA
44. Type II SOS: (2)
• Allocate ONLY UNIQUE variance
• Only works if NO interaction
45. Type III SOS: (3)
• Do not allocate ANY common variance to any factor/interaction
• Works if SIGNIFICANT interaction
• But main effects may be reduced
46. 2 factors: treatment and expectations. Treatment has 3 levels, Expectations has 2 levels. This is called a __ anova
3x2 anova
47. Crossing the 2 factors creates a structure with __ cells
6 (3x2)
48. The mean of each cell is indexed by___
the levels of the 2 factors
49. F is a ratio of …
How to calculate?
• F is a ratio of mean squares
• Divide its mean square by the residual
50. Contrasts (3)
• A planned comparison - (tests meaningful hypothesis)
• A linear combination of predictors that sum to zero
• Another way of specifying dummy variables
51. Post hoc pairwise comparisons (4)
• If you don't have any meaningful hypotheses can conduct set of these
• EXPLORATORY rather than confirmatory
• Proposed after meaningful hyps have been tested against data
• Compare mean of each cell with mean of every other cell controlling for the FAMILYWISE ERROR RATE
52. Type 1 error =
reject hyp when its true
53. Familywise error rate
If you test k hypotheses, probability of making at least 1 type 1 error cannot be less than 1-(1-a)^k
54. ANCOVA
Hybrid form of multiple regression + ANOVA
55. ANCOVA combines:
• 1 or more CATEGORICAL factors (as dummy variables/contrasts) + 1 or more CONTINUOUS predictors (called covariates)
• (Interest usually lies in the effects of the FACTORS on the DV)
56. In ANCOVA, the predictors serve 2 pain purposes:
• 1) To reduce residual error/variance
• 2) To "control" for possible confounding effects of the covariate(s)
57. In ANCOVA, it is desirable that the covariate be at least __ correlated with the DV, and at most __ correlated with the FACTOR of interest
moderately, weakly
58. The __ is based on the variance of the differences between conditions across participants.
This is the same as the __ between the __ and the __
The paired samples t-test is based on the variance of the differences between conditions across participants.

This is the same as the interaction between the factor (time) and the subjects variable (id).
59. Assumptions of repeated measures (3)
• 1. Independence of "subjects" (assume unrelated to each other)
• 2. Normal distribution within each cell
• 3. Sphericity
60. Sphericity =
Assumption that the VARIANCES of differences between each pair of within-subjects cells are EQUAL
61. Sphericity can be tested using:
• Mauchly test of sphericity
• W=.0004, p<2.2e-16
• = REJECT hypothesis that variances of differences are equal
62. Why are the Greenhouse-Geisser and Huynh-Feldt corrections
sometimes required by a repeated measures anova?
• These corrections are applied to the degrees of freedom of an F-ratio
• in order to adjust for failure of the sphericity assumption in repeated
• measures anova.
63. Diagrams serve __ and __ functions
• Expository: explain/provide info
• Productive: generate new info
64. Graphs: (2)
• Diagrams that exhibit relationship between 2 sets of numbers as a set of points having coordinates determined by the relationship (plots).
• Used to illustrate relationships (charts)
65. ggplot 2 package invokes the following terminology (4)
• Aesthetics - maps data onto logical elements of graph
• Geometrics - specifies how elements of graph are represented
• Themes - Modifies look/feel of graph elements
• Others
 Author: peep_muri ID: 311080 Card Set: DRIP Updated: 2015-11-11 01:37:16 Tags: drip Folders: Description: drip Show Answers: