The flashcards below were created by user
on FreezingBlue Flashcards.
What are the forms of specificaton error?
What are the consequences of OLS?
Including an irrelevant independent variable; OLS is unbiased, inefficient and consistent.
Excluding a relevant independent vairable; OLS is biased, efficient and inconsistent.
Wrong functional form
Relevance of stepwise regression to policy
popular tool for dealing with the specification problem, at least as it relates to the choice of explanatory variables; runs regression on all possible subsets of explanatory variables; addresses the problem of "choice of variables".
- hidden assumptions:
- *there is a true model
- *same is a near-perfect rep of the pop, so overfitting is not a prob
- *data capable of telling you something about true specification
a modified forward routine that checks the impact of a new variable on the significance of existing variables and drops accordingly; dropped if they become nonsignificant as other predictors are added.
perils of stepwise regression
can violate sampling theory as theory is based on sampling yet stepwise regression focuses narrowly on fitting the best model to the sample; problem of over-fitting data.
since variables are chosen because they look like good predictors, estimates of anything associated with prediction can be misleading. magnitudes of regression coefficients and t statistics tend to be larger than they really are. standard errors smaller than what would be observed. CIs tend to be too narrow. p values are too small. R2 and adjusted R2 is too large. Overall F ratio too large and its p value is too small. standard error of the estimate is too small.
- *stepwise p values are nominally, not statistically, significant
- *troublesome feature of stepwise is that the characteristics of the report model can change dramatically, with some variables entering and others leaving
- *stepwise must exclude obss that are missing any of the potential predictors. however, some of these obss will not be missing any of the predictors in the final model. sometimes one or more of the predictors in the final model are no longer statistically significant when the model is fitted to the data set that includes these obss that had been set aside, even when values are missing at random.
starts with empty model; variabl with smallest P value is placed in model. variables are added one-at-a-time as long as their p values are small enough (smallest p value in presence of the predictors already in the equation)
starts with all predictors in model. variable that is least significant, the one with the largest p value, is removed and model is refitted. each subsequent step removes the least significant variable until all remaining variables have individual p values smaller than some value.
disadv: NEED A LOT OF VARS
Other specification tests
- *AIC vs SIC: when testing whether models are nested or not, one w/ lower AIC is better; appropriate for large samples; penalize additional regressors more harshly than adjusted R2
- *non-nested F test for "omnibus models"
- *MSE: cannot really compare models that have diff dep vars; ex post predictive power
- *Ramsey RESET
Ramsey RESET test
- *"an omitted variable test"
- *a test of nonlinearies in choice of functional form
- *add powers of predicted y hats to model: yhatsq,cubed, quad
- *if F test of coeff is sig, evidence of specification error (omitted variables)
- *"estat ovtest" in stata
- *misspecification of functional form
Mainstream Bayesian perspective
- Whether a variable "belongs in the model or
- not" is equivalent to asking whether there is substantial prior probability in the neighborhood of zero for the associated coefficient. If there is little probability around zero, then the variable associated with that coefficient should be included in the model (ie, our priors indicate that it is an important variable)
Radical Bayesian perspective
- There is no such thing as a "true" model specification. It's more important to be able to forecast the next event or to provide the best information available on the first (partial) derivative associated with an explanatory variable and to be able to account for the uncertainty associated with the forecast or the marginal effect
- than it is to reify the objective existence of a "true population parameter" or of a "true, correct model"
Leamer's Extreme Bounds Analysis, "Pragmatic Bayesian Approach"
EBA is a second best approach to specification problems
- (1) CLASSIFYING RHS VARS: Can be thought of as separating the explanatory variables into three sets: (1) the key inferential variable or variables; (2) the variables that are strongly motivated by theory, including basic control variables, and; (3) "doubtful variables", ones whose coefficients are associated with variables over which you have weak priors, variables that may have been suggested by the literature, but which you do not feel are extremely important
- (2) EXTREME VALUES OF KEY POLICY VARIABLE LYING W/IN THE 90% DATA CONFIDENCE ELLIPSE
- (3)refinements: do more than simply record extremem values; the info contract curve
- *entertains multiple theoretical perspectives
- *carrying out a series of basic EBAs, each of which is based on a competing substantive theory
Application of EBA; policy implications
capital punishment, deterrent? conflicting prior beliefts on cap punishment. allows researchers to reach a mutually accepted conclusion if data are sufieciently strong. framework may be inadequate for resolving issue.
- The most important thing is the sensitivity of
- critical inferences to plausible changes in the model specification. The "truth-seeking" goal of finding the "single correct specification" may have a place in the hard sciences, but it is typically not very useful in policy settings.
Difference b/w EBA & Stepwise
both second best alternatives; one is bottom-up approach to statistics, other top-down
EBA: how Bayesian thinking can guide good practice; you don't have to be a Bayesian purist; it's the framework that matters
no theory behind stepwise
Measurement error in dependent variable
- Errors get swept into the error term.
- OLS is inefficient, yet consistent and unbiased
Measurement error in independent variable
- OLS generates parameter estimates that are
- systematically biased toward zero. This is a kind of endogeneity in which OLS becomes biased and inconsistent. It consequently becomes more difficult to reject null hypotheses.
- If your coefficients are significant despite the
- problem of measurement error in the explanatory variables, you have a robust result and there is often no need to worry about measurement error in X. At the very least, you have a significant lower bound for the marginal effect
Baias doesn't go away as number of obs increases
What happens with restrictions on the dep var?
- in many cases OLS is biased & inconsistent.
- OLS' optimatl properties break down.
- degree of bias depends on how effectively constraining the restricton is.
- need to resort to MLE in order to estimate the paramenters.
Situations in which OLS breaks down and the solutions
- Heteroskedasticity: WLS form of GLS
- Autocorrelation: GLS
- Specification error: theory; sensitive analysis (EBA)
- Measurement error=formula for theoretical bias
Binary dummy dependent variables
attractive prop of binary dependent var: produce predictions that can be interpreted as probabilities that Y=1.
- the parameter estimates of the slopes can be
- interpreted as marginal probabilities that y=1. this feature of LPM make it easier to inerpret than other models of binary dep vars.
good when dataset large, few predicted values near bounds.
Tools: regular OLS (LPM), logit MLE (cumulative distribution function), probit MLE
shortcomings of LPM
- *weird hetero
- *predicted probabilities are not bounded by 0 and 1
- *R2 is uninterpretable
- *error term is not plausibly normally distributed (implications on standard errors, t-tests, inferences)
remedy to problems of LPM
- *constrain y hats to bw 0/1. gives Z-shaped regression curve
- *problem: predicts somethings will take place, some not. statististicians don't like probability of something occuring 100%
- *transform values of dep vars via logisitc transf
- *convert into "log odds"= problem if dummy is binary
MLE perspective on probs posed by LBP: binary logit and probit
- * Bernoulli structure of the
- dependent variable (ie, a string of zeros and ones)
- *Each outcome y=1 probability p; y=0 probability 1-p.
- *p=F(Xbeta), where F(Xb) is knownas a "linkage function" because it links y to p
- *cumulative logistic function F(Xb) structure:
- *max this, get MLEs
- *function that links y to p
- *F() can represent the linear model (leading to the LPM), cumulative logistic
- function, the cumulative normal distribution probit ("normit"), the poisson distribution, the negative binomial distribution, or any one of a family offunctions that generate sigmoid curves bounded by zero and one
difference b/w logistic Fxb and probit Fxb
- *Probit lends itself more naturally to
- the economics of random utility theory and is sometimes preferred on these grounds
random utility theory
- *an individual has a continuous utility or index function that is unobserved. this is the real dependent variable. it reveals itself as a 1 or 0 depending on whether or not the index function crosses a subjective threshold.
- *latent dependent variable interpretation for discret choice models (logit probit models)
- *coefficients are difficult to interpret, turn to odds ratios
- *measures of gof not as tidy
- *displaying results is messy
- *when y takes o more than one categorical value (i.e. Y is multinomial), logit imposes restriction of the "independence of irrelavant alternatives"
- *problem of "complete separation" when there is no overlap in y=0/1 valluues w.r.t. to x-> cases of perfect prediction are dropped from estimation
logit measures of gof
- *psudeo R2
- * contingency table of predicted vs actual values of y given a certain threshold probability (.5)
- *AIC SIC
Ordinal or multinomial world
- When the dependent variable takes on more than two
- categories of values
probit vs logit in ordinal and mulit settigns
- *probit is more flexible and is prefferred.
- *Logit models impose the rigid assumption of
- "independence of irrelevant alternatives"; logit
- doesn't work well when the alternatives are close substitutes or close complements. Probit permits alternatives to possess a degree of substitutability
- or complementarity.
- *Multinomial probit, like binary probit, lacks
- the (relative) convenience of lending itself to creating odds ratios. It is therefore difficult to interpret estimated coefficients directly
independence of irrelevant alternative
- *assumption: outcomes of the categorical or ordinal dependent variable are completely independent.
- (Hausman test)
- *no close associations among the response categories or no correlation in errors across categories
logit and probit problems
- *sensitive to model specification and multicollinearity
- *"complete separation"
censored dep vars (tobit)
- *you have a subset of the observations clumped on
- either an upper or lower bound (or both)
- *we observe all of the x values, but not the genuine y values that fall outside the bounds. censored values of y are reported as being equal tothe value of the bound that is being exceeded.
truncated dep vars (tobit)
- *you simply have a sample in which observations do not all have the same probability of having been included in the sample
- *we know all of the correct values of x and y, it's just that y never exceeds the bounds
tobit ML estimator, IMR
- *deals with censoring, truncation, survival analysis, duration, selection bias
- *appropriate version of "inverse Mills ratio" (IMR) into respective model
- *IMR: a sort of "hazard function", a measure of the distortion caused when observations approach the limiting bound(s). When included in the model it corrects for the bias induced by truncation or censoring
- *The appropriate version of the IMR depends on the nature of the problem and whether observations are constrained from above or from below.
- *bias due to omitting a relevant variable, namely some relevant version of the IMR
- *max likelihoood for handling dependent variables that can be thought as "counts" (positive whole #s)
- *restrictive assumption that the mean is equal to the
- *If the counts in the dataset do not include many
- zero values or are not close to zero, then regular OLS regression is fine
- *If the mean is not plausibly equal to the
- variance, we have "overdispersion" (Poisson model is not appropriate via Hausman test)
- *maximum likelihood negative binomial reg is less restricted than poisson
measure of scale of variables and relationship b/w measurement scale and general linear inference
- can put interval or ratio scale variables into nominal categories into contingency tables
- *ordinal variables in OLS into dummy varaiables; test sig of marginal impacts via F-tests
longitudinal, combining cross-section with time-series data
cross-section time series data structures
FEM, REM, SUR
Pooled: ignores this structure
advantages of panel
- *greater variation in x
- *less collineraity
- *autocorrelation * heter affect standard errors in opp directions
- *greater df
- *better capture the dynamics of change=> incorporates "long-term" effects of cross-section & "short-term adjustment effects" of time-series
DW stat in cross time datasets
- *may not work properly
- *computer usually doesn't know that the dataset has this special structure, so it calculates the DW stat as if the entire pooled dataset had a purely time-series structure
FEM: cross-sectional effects only ("one-way fixed effects")
- *dummy variables for each cross-section
- *Type II model spec
- *include dummy interaction terms as well (Type IV)
- *"control for unobservable time-invariant [unit-specific]
- heterogeneity across the cross-sectional units"
FEM: time effects only
- *"control for unobservable unit-invariant
- heterogeneity across time"
- *Using dummy variables for time is a much gentler and more flexible way to deal with time than simply including time as a trend
- variable; tradeoff between flexibility and efficiency. Adding a set of dummy variables consumes more degrees of freedom than does a simple time-trend variable. In small samples, efficiency may be more valuable than flexibility
fixed cross-sectional plus time period effects
- *dummy for cross and time (minus one in each case to avoid perfect multi)
- *coefficients of DVs for time: "holding the level efficts of time constant, not of interest
- *include dummy interaction terms for cross section (TYpe IV extensions to FEM)
difference b/w REM and FEM
- REM: forces the error components to be independent of both the exp vars and overall error term
- FEM: not the case
(3)FEM: unbaised & consistent; REM is more efficient if big N and small T, but biased to extent that error term is correlated w/ ind vars.
(4) If the cross-sectional units in sample represent the entire population of units, then FEM pref over REM.
Hausman test for REM; reject null; stick w/ FEM
- *empirical Bayes models ("shrinkageestimators", "Stein estimators") all boil down to a form of random effects model
- *REMs specify that the intercept parameter and slope parameters are random variables (fiddles with OLS assumption of true and fixed para; consistent w/ Bayes, prob for sampling theorists)
- *random intercept term that varies across cross-sectional units (type 2)
- *fixed intercept overall ("grand intercept"), and intercept for each cross-sectional is a random perturbance off of this grand intercept
ML estimator for a given cross-sectional unit has general form of a shrinkage estimator
A parameter is a weighted average of the grand parameter estimate and the estimate for the cross-sectional unit in question, the weights being in proportion to the precision of their respective parameter estimates
ML estimator and Bayesian estimator
same, except instead of having a cross-sectional parameter shrink toward a subjective prior, it shrinks toward the empirical grand mean
Bayesian shirankage formula
assumption that the error term is asymptotically normally distributed
Seemingly Unrelated Regression
- *permits"contemporaneous correlation" across time among the error terms of two or more parallel
- *SUR relaxes assumption of no correlation in the errors across the models when running time-series models separately (Type V)
- *Errors are homoskedastic within each cross-section and have no autocorrelation, but are typically groupwise heteroskedastic across the cross-sections (Type V). Correlation occurs across cross-sections
- * generally sharper standard errors for coefficients than a Type V specification
- *accounts for causal forces that are not explicitly identified, yet which "inhere in the environment in which the data are generated".
Borrowing strength by shrinking the maximum likelihood estimate toward a grand mean improves predictive power, albeit at some price in terms of (Bayesian) bias
Hierarchical Linear Models
- Sometimes data generated at one level of analysis can be thought of as being "nested" within higher-level clusters.
- Special case of REM.
- Relationship is cross-sectional; one level of subjects in which a set of occasions is embedded. (Panel, crossed factor, HLM,, nested factor)
Empirical Bayes Estimators
- Stein's Paradox
- Empirical Bayes shrinkage estimators: //nonparametric smoothing in that they smooth MLEs, toward the prior in the form of an empirical "grand mean"
- REMs, HLM
- Way to incorporate seemingly extraneous information into our analysis in order to sharpen our predictions and inferences. Good for messy data, dealing wth unobserved variables and effects w/o having to define what they are.
- Simple linear regression: prediction "borrows strength" from observations made at different levels. Takes prediction for specific level and "shrinks" it toward the conditional mean for all levels. Greater df, sharper standard error than the more restrictive prediction.
- FEM with Type 4 structure: borrow strength from all the different cross-sectional units to help estimate the parameters for each of the units individually (DVs capture the "level effects" of unobserved factors specific to cross-sectional units or to time periods)
- SUR: borrows strength by permitting there to be contemporaneous correlation in the errors across cross-sectional models. Accounts for unspecified effects that affect pairs of cross-sectional units similarly.
- REM (including HLM): borrow strength by shrinking the parameter estimates for specific cross-sections toward the grand mean effect for the ensemble of all cross-sectional units (AKA empirical Bayes theory).
- Classical Bayesian models: borrow strength from the prior by shrinking the maximum likelihood estimates toward the prior. MLEs modified in light of information drawn from theory or previous experience.
Accounting for effects of unobserved variables
- FEM, REM: capture the level effects of unobservables
- SUR: capture contextual "environmental effects"
- Tobit: deal w sample selection & self selection bias
- HLM: capture unobserved effects related to how data may be nested hierarchically
- Lagged variables for data with a time dimension:
- *Models with lagged dependent variables on the
- righthand side="autoregressive models"
- *Models with lagged independent variables= "distributed lag models"
- *Models with first differences on either the lefthand or righthand side (or both)
How many lags to include?
- Pragmatic, bottom-up response: Test the coefficients of lagged terms, keep lagging until they become insignificant
- Infinite lags: The Koyck transformation (converting infinite distributed lags of X to an autoregressive specification)
- Polynomial lags: The Almon lag
Allowing the intercept and slopes adapt over time via Bayesian updating. This eliminates the problem of having your coefficient estimates depend on which particular slice of time your sample represents. Kalman filter models are asymptotically related to ARIMA time-series estimation.
- generic solution to problem of endogeneity: correlation between error term and one or more of the RHS variables. (prob=OLS bias, inconsistent, wrong SEs; seen this in measurement error in Xs a, ommitted vars).
- meet two criteria: (1) they are highly correlated with the endogenous explanatory variable in question, and; (2) have zero correlation with the unobserved factors affecting y (indp of error term in PRF)
a measureable variable that is highly correlated with a second variable that is usually either unobserved or difficult to measure
note about instrumental variables
- diifult to come up with credible instrumental variables.
- IVs are credible in simultaneous equation systems.
Multivariate analysis: simultaneous equation models
- causal arrow diagrams.
- Models in which the dependent variable of one
- model enters as an explanatory variable in another model
- Models in which a exogenous variable affects two
- endogenous variables simultaneously
- Models in which one or more identities link dependent variables across models
difference b/w Simultaeneous equation vs SUR
- Simultaneous: models are linked together via the systematic parts of the model
- SUR: linked via the error term
uncorrelated with error terms.
lagged endogenous vars are considered pre-det
- induces a correlation between the error term and one or more of the right-hand side variables in one or more of the individual structural equations
- OLS a biased and inconsistent estimator
- poses problem of "endogeneity".
- 2 problems: identification and estimation
structural model vs reduced-form
- There are as many reduced-form models as there are endogenous variables in the system. Each
- one of them expresses an endogenous variable as a function of all of the exogenous and pre-determined variables in the system.
- The inability to algebraically manipulate reduced-form parameter estimates to come up with unique structural parameter estimates.
- Underidentified: no logical way to come up with unique parameter estimates; there are for exactly iden and over
- If you are not concerned with drawing inferences about individual parameters but merely interested in forecasting, there's no problem: the reduced-form equations make perfectly good forecasting models.
- An equation w/in a sim system is identified to the extent that the other equations in the system include pre-det vars. Adding additional pre-det (exo) vars to an underidentified eq will not help to identify its own parameters.
way to check identifiabilitity; K-k ??? m-1
- K:# of exo vars in entire system
- k: # of exo in eq
- m:# of endo in eq
OLS becomes biased & inconsistent; using OLS to estimate the first equation while ignoring in presence of 2nd equation is inapprropriate; system of equations must be est simult.
Indirect Least Squares
- for exactly identified equations in a simult system.
- exactly one way to translate the II coeficients from red-form system into the B coeff in structural model
Two-Stage Least Squares
- for estimating just-identified and overidentified equations in a system of simult eq.
- key workhorse of simult eq estimation.
- builds directly on and trumps ILS.
- method that yiedls unique parameter estimates for the coeffs even when the eq is over-identified.
- If first stages of 2SLS is a poor fit (low R, t), then system may be identified, but the instrument (pred value of endo RHS var) is weak & OLS possibly inconsistent.
Three-Stage Least Squares
- takes 2SLS one step further.
- Iterative 3SLS is a nonlinear procedure that continues the 3SLS process until the parameter estimates converge to stable values.
- tend to over model sample
- 3SLS exploits insight: since the separate models in a simultaneous system operate "simultaneously", it is likely that unobserved effects (errors) are contemporaneously correlated.
Limited-Information Maximum Likelihood, Full Info....
these MLE approaches make stronger assumptions about dist of RVs than purely sampling theories; assumption of normality; draw inferences are conditional on sample info. LIML=2SLS FIML=3SLS
Simultaneous equation modeling
- a somewhat blunt tool
- imposes more structure than meets the eye (depends heavily on the specification being linear and correctly specified)
- any misspecification in one equation propagates throughout the system
- we began with the notion of (PRF) in which the parameters (the b’s) and explanatory variables (the X’s) were taken to be “fixed’, and only the error term and the dependent variable were random variables.
- we have relaxed the assumption that the PRF parameters must be fixed (they can be cast as random variables in HLM), and permitted the X’s to be random variables (in systems of simultaneous equations).
- normal coeff: 1 unit change in coeff-> x slope coefficient by 100--get %
- log coeff: 1% change in log coeff-> divide slope coeff by 100