The flashcards below were created by user
jtpdogyo
on FreezingBlue Flashcards.

What are the forms of specificaton error?
What are the consequences of OLS?
Including an irrelevant independent variable; OLS is unbiased, inefficient and consistent.
Excluding a relevant independent vairable; OLS is biased, efficient and inconsistent.
Wrong functional form

Relevance of stepwise regression to policy
popular tool for dealing with the specification problem, at least as it relates to the choice of explanatory variables; runs regression on all possible subsets of explanatory variables; addresses the problem of "choice of variables".
 hidden assumptions:
 *there is a true model
 *same is a nearperfect rep of the pop, so overfitting is not a prob
 *data capable of telling you something about true specification

Stepwise regression
a modified forward routine that checks the impact of a new variable on the significance of existing variables and drops accordingly; dropped if they become nonsignificant as other predictors are added.

perils of stepwise regression
can violate sampling theory as theory is based on sampling yet stepwise regression focuses narrowly on fitting the best model to the sample; problem of overfitting data.
since variables are chosen because they look like good predictors, estimates of anything associated with prediction can be misleading. magnitudes of regression coefficients and t statistics tend to be larger than they really are. standard errors smaller than what would be observed. CIs tend to be too narrow. p values are too small. R2 and adjusted R2 is too large. Overall F ratio too large and its p value is too small. standard error of the estimate is too small.
 *stepwise p values are nominally, not statistically, significant
 *troublesome feature of stepwise is that the characteristics of the report model can change dramatically, with some variables entering and others leaving
 *stepwise must exclude obss that are missing any of the potential predictors. however, some of these obss will not be missing any of the predictors in the final model. sometimes one or more of the predictors in the final model are no longer statistically significant when the model is fitted to the data set that includes these obss that had been set aside, even when values are missing at random.

forward selection
starts with empty model; variabl with smallest P value is placed in model. variables are added oneatatime as long as their p values are small enough (smallest p value in presence of the predictors already in the equation)

backward elimination
starts with all predictors in model. variable that is least significant, the one with the largest p value, is removed and model is refitted. each subsequent step removes the least significant variable until all remaining variables have individual p values smaller than some value.
disadv: NEED A LOT OF VARS

Other specification tests
 *AIC vs SIC: when testing whether models are nested or not, one w/ lower AIC is better; appropriate for large samples; penalize additional regressors more harshly than adjusted R2
 *nonnested F test for "omnibus models"
 *MSE: cannot really compare models that have diff dep vars; ex post predictive power
 *Ramsey RESET

Ramsey RESET test
 *"an omitted variable test"
 *a test of nonlinearies in choice of functional form
 *add powers of predicted y hats to model: yhatsq,cubed, quad
 *if F test of coeff is sig, evidence of specification error (omitted variables)
 *"estat ovtest" in stata
 *misspecification of functional form

Mainstream Bayesian perspective
 Whether a variable "belongs in the model or
 not" is equivalent to asking whether there is substantial prior probability in the neighborhood of zero for the associated coefficient. If there is little probability around zero, then the variable associated with that coefficient should be included in the model (ie, our priors indicate that it is an important variable)

Radical Bayesian perspective
 There is no such thing as a "true" model specification. It's more important to be able to forecast the next event or to provide the best information available on the first (partial) derivative associated with an explanatory variable and to be able to account for the uncertainty associated with the forecast or the marginal effect
 than it is to reify the objective existence of a "true population parameter" or of a "true, correct model"

Leamer's Extreme Bounds Analysis, "Pragmatic Bayesian Approach"
EBA is a second best approach to specification problems

Basic EBA
 (1) CLASSIFYING RHS VARS: Can be thought of as separating the explanatory variables into three sets: (1) the key inferential variable or variables; (2) the variables that are strongly motivated by theory, including basic control variables, and; (3) "doubtful variables", ones whose coefficients are associated with variables over which you have weak priors, variables that may have been suggested by the literature, but which you do not feel are extremely important
 (2) EXTREME VALUES OF KEY POLICY VARIABLE LYING W/IN THE 90% DATA CONFIDENCE ELLIPSE
 (3)refinements: do more than simply record extremem values; the info contract curve

Fullblown EBA
 *entertains multiple theoretical perspectives
 *carrying out a series of basic EBAs, each of which is based on a competing substantive theory

Application of EBA; policy implications
capital punishment, deterrent? conflicting prior beliefts on cap punishment. allows researchers to reach a mutually accepted conclusion if data are sufieciently strong. framework may be inadequate for resolving issue.
 The most important thing is the sensitivity of
 critical inferences to plausible changes in the model specification. The "truthseeking" goal of finding the "single correct specification" may have a place in the hard sciences, but it is typically not very useful in policy settings.

Difference b/w EBA & Stepwise
both second best alternatives; one is bottomup approach to statistics, other topdown
EBA: how Bayesian thinking can guide good practice; you don't have to be a Bayesian purist; it's the framework that matters
no theory behind stepwise

Measurement error in dependent variable
 Errors get swept into the error term.
 OLS is inefficient, yet consistent and unbiased

Measurement error in independent variable
 OLS generates parameter estimates that are
 systematically biased toward zero. This is a kind of endogeneity in which OLS becomes biased and inconsistent. It consequently becomes more difficult to reject null hypotheses.
 If your coefficients are significant despite the
 problem of measurement error in the explanatory variables, you have a robust result and there is often no need to worry about measurement error in X. At the very least, you have a significant lower bound for the marginal effect

Consistency
Baias doesn't go away as number of obs increases

What happens with restrictions on the dep var?
 in many cases OLS is biased & inconsistent.
 OLS' optimatl properties break down.
 degree of bias depends on how effectively constraining the restricton is.
 need to resort to MLE in order to estimate the paramenters.

Situations in which OLS breaks down and the solutions
 Multicollinearity:
 Heteroskedasticity: WLS form of GLS
 Autocorrelation: GLS
 Specification error: theory; sensitive analysis (EBA)
 Measurement error=formula for theoretical bias

Binary dummy dependent variables
attractive prop of binary dependent var: produce predictions that can be interpreted as probabilities that Y=1.
 the parameter estimates of the slopes can be
 interpreted as marginal probabilities that y=1. this feature of LPM make it easier to inerpret than other models of binary dep vars.
good when dataset large, few predicted values near bounds.
Tools: regular OLS (LPM), logit MLE (cumulative distribution function), probit MLE

shortcomings of LPM
 *weird hetero
 *predicted probabilities are not bounded by 0 and 1
 *R2 is uninterpretable
 *error term is not plausibly normally distributed (implications on standard errors, ttests, inferences)

remedy to problems of LPM
 First:
 *constrain y hats to bw 0/1. gives Zshaped regression curve
 *problem: predicts somethings will take place, some not. statististicians don't like probability of something occuring 100%
 Second:
 *transform values of dep vars via logisitc transf
 *convert into "log odds"= problem if dummy is binary

MLE perspective on probs posed by LBP: binary logit and probit
 * Bernoulli structure of the
 dependent variable (ie, a string of zeros and ones)
 *Each outcome y=1 probability p; y=0 probability 1p.
 *p=F(Xbeta), where F(Xb) is knownas a "linkage function" because it links y to p
 *cumulative logistic function F(Xb) structure:

 *max this, get MLEs

linkage functions
 *function that links y to p
 *F() can represent the linear model (leading to the LPM), cumulative logistic
 function, the cumulative normal distribution probit ("normit"), the poisson distribution, the negative binomial distribution, or any one of a family offunctions that generate sigmoid curves bounded by zero and one

difference b/w logistic Fxb and probit Fxb
 *arbitrary
 *Probit lends itself more naturally to
 the economics of random utility theory and is sometimes preferred on these grounds

random utility theory
 *an individual has a continuous utility or index function that is unobserved. this is the real dependent variable. it reveals itself as a 1 or 0 depending on whether or not the index function crosses a subjective threshold.
 *latent dependent variable interpretation for discret choice models (logit probit models)

logit problems
 *coefficients are difficult to interpret, turn to odds ratios
 *measures of gof not as tidy
 *displaying results is messy
 *when y takes o more than one categorical value (i.e. Y is multinomial), logit imposes restriction of the "independence of irrelavant alternatives"
 *problem of "complete separation" when there is no overlap in y=0/1 valluues w.r.t. to x> cases of perfect prediction are dropped from estimation

logit measures of gof
 *psudeo R2
 * contingency table of predicted vs actual values of y given a certain threshold probability (.5)
 *AIC SIC

Ordinal or multinomial world
 When the dependent variable takes on more than two
 categories of values

probit vs logit in ordinal and mulit settigns
 *probit is more flexible and is prefferred.
 *Logit models impose the rigid assumption of
 "independence of irrelevant alternatives"; logit
 doesn't work well when the alternatives are close substitutes or close complements. Probit permits alternatives to possess a degree of substitutability
 or complementarity.
 *Multinomial probit, like binary probit, lacks
 the (relative) convenience of lending itself to creating odds ratios. It is therefore difficult to interpret estimated coefficients directly

independence of irrelevant alternative
 *assumption: outcomes of the categorical or ordinal dependent variable are completely independent.
 (Hausman test)
 *no close associations among the response categories or no correlation in errors across categories

logit and probit problems
 *sensitive to model specification and multicollinearity
 *"complete separation"

censored dep vars (tobit)
 *you have a subset of the observations clumped on
 either an upper or lower bound (or both)
 *we observe all of the x values, but not the genuine y values that fall outside the bounds. censored values of y are reported as being equal tothe value of the bound that is being exceeded.

truncated dep vars (tobit)
 *you simply have a sample in which observations do not all have the same probability of having been included in the sample
 *we know all of the correct values of x and y, it's just that y never exceeds the bounds

tobit ML estimator, IMR
 *deals with censoring, truncation, survival analysis, duration, selection bias
 *appropriate version of "inverse Mills ratio" (IMR) into respective model
 *IMR: a sort of "hazard function", a measure of the distortion caused when observations approach the limiting bound(s). When included in the model it corrects for the bias induced by truncation or censoring
 *The appropriate version of the IMR depends on the nature of the problem and whether observations are constrained from above or from below.
 *bias due to omitting a relevant variable, namely some relevant version of the IMR

poisson regression
 *max likelihoood for handling dependent variables that can be thought as "counts" (positive whole #s)
 *restrictive assumption that the mean is equal to the
 variance
 *If the counts in the dataset do not include many
 zero values or are not close to zero, then regular OLS regression is fine
 *If the mean is not plausibly equal to the
 variance, we have "overdispersion" (Poisson model is not appropriate via Hausman test)
 *maximum likelihood negative binomial reg is less restricted than poisson

measure of scale of variables and relationship b/w measurement scale and general linear inference
 can put interval or ratio scale variables into nominal categories into contingency tables
 *ordinal variables in OLS into dummy varaiables; test sig of marginal impacts via Ftests

panel data
longitudinal, combining crosssection with timeseries data

crosssection time series data structures
FEM, REM, SUR
Pooled: ignores this structure

advantages of panel
 *greater variation in x
 *less collineraity
 *autocorrelation * heter affect standard errors in opp directions
 *greater df
 *better capture the dynamics of change=> incorporates "longterm" effects of crosssection & "shortterm adjustment effects" of timeseries

DW stat in cross time datasets
 *may not work properly
 *computer usually doesn't know that the dataset has this special structure, so it calculates the DW stat as if the entire pooled dataset had a purely timeseries structure

FEM: crosssectional effects only ("oneway fixed effects")
 *dummy variables for each crosssection
 *Type II model spec
 *include dummy interaction terms as well (Type IV)
 *"control for unobservable timeinvariant [unitspecific]
 heterogeneity across the crosssectional units"

FEM: time effects only
 *"control for unobservable unitinvariant
 heterogeneity across time"
 *Using dummy variables for time is a much gentler and more flexible way to deal with time than simply including time as a trend
 variable; tradeoff between flexibility and efficiency. Adding a set of dummy variables consumes more degrees of freedom than does a simple timetrend variable. In small samples, efficiency may be more valuable than flexibility

fixed crosssectional plus time period effects
(2way)
 *dummy for cross and time (minus one in each case to avoid perfect multi)
 *coefficients of DVs for time: "holding the level efficts of time constant, not of interest
 *include dummy interaction terms for cross section (TYpe IV extensions to FEM)

difference b/w REM and FEM
 (2)
 REM: forces the error components to be independent of both the exp vars and overall error term
 FEM: not the case
(3)FEM: unbaised & consistent; REM is more efficient if big N and small T, but biased to extent that error term is correlated w/ ind vars.
(4) If the crosssectional units in sample represent the entire population of units, then FEM pref over REM.
Hausman test for REM; reject null; stick w/ FEM

REM
 *empirical Bayes models ("shrinkageestimators", "Stein estimators") all boil down to a form of random effects model
 *REMs specify that the intercept parameter and slope parameters are random variables (fiddles with OLS assumption of true and fixed para; consistent w/ Bayes, prob for sampling theorists)
 *random intercept term that varies across crosssectional units (type 2)
 *fixed intercept overall ("grand intercept"), and intercept for each crosssectional is a random perturbance off of this grand intercept

ML estimator for a given crosssectional unit has general form of a shrinkage estimator
A parameter is a weighted average of the grand parameter estimate and the estimate for the crosssectional unit in question, the weights being in proportion to the precision of their respective parameter estimates

ML estimator and Bayesian estimator
same, except instead of having a crosssectional parameter shrink toward a subjective prior, it shrinks toward the empirical grand mean

Bayesian shirankage formula
assumption that the error term is asymptotically normally distributed

Seemingly Unrelated Regression
 *permits"contemporaneous correlation" across time among the error terms of two or more parallel
 models
 *SUR relaxes assumption of no correlation in the errors across the models when running timeseries models separately (Type V)
 *Errors are homoskedastic within each crosssection and have no autocorrelation, but are typically groupwise heteroskedastic across the crosssections (Type V). Correlation occurs across crosssections
 * generally sharper standard errors for coefficients than a Type V specification
 *accounts for causal forces that are not explicitly identified, yet which "inhere in the environment in which the data are generated".

Stein's paradox
Borrowing strength by shrinking the maximum likelihood estimate toward a grand mean improves predictive power, albeit at some price in terms of (Bayesian) bias

Hierarchical Linear Models
 Sometimes data generated at one level of analysis can be thought of as being "nested" within higherlevel clusters.
 Special case of REM.
 Relationship is crosssectional; one level of subjects in which a set of occasions is embedded. (Panel, crossed factor, HLM,, nested factor)

Empirical Bayes Estimators
 Stein's Paradox
 Empirical Bayes shrinkage estimators: //nonparametric smoothing in that they smooth MLEs, toward the prior in the form of an empirical "grand mean"
 REMs, HLM

Borrowing strength
 Way to incorporate seemingly extraneous information into our analysis in order to sharpen our predictions and inferences. Good for messy data, dealing wth unobserved variables and effects w/o having to define what they are.
 Simple linear regression: prediction "borrows strength" from observations made at different levels. Takes prediction for specific level and "shrinks" it toward the conditional mean for all levels. Greater df, sharper standard error than the more restrictive prediction.
 FEM with Type 4 structure: borrow strength from all the different crosssectional units to help estimate the parameters for each of the units individually (DVs capture the "level effects" of unobserved factors specific to crosssectional units or to time periods)
 SUR: borrows strength by permitting there to be contemporaneous correlation in the errors across crosssectional models. Accounts for unspecified effects that affect pairs of crosssectional units similarly.
 REM (including HLM): borrow strength by shrinking the parameter estimates for specific crosssections toward the grand mean effect for the ensemble of all crosssectional units (AKA empirical Bayes theory).
 Classical Bayesian models: borrow strength from the prior by shrinking the maximum likelihood estimates toward the prior. MLEs modified in light of information drawn from theory or previous experience.

Accounting for effects of unobserved variables
 FEM, REM: capture the level effects of unobservables
 SUR: capture contextual "environmental effects"
 Tobit: deal w sample selection & self selection bias
 HLM: capture unobserved effects related to how data may be nested hierarchically

Dynamic models
 Lagged variables for data with a time dimension:
 *Models with lagged dependent variables on the
 righthand side="autoregressive models"
 *Models with lagged independent variables= "distributed lag models"
 *Models with first differences on either the lefthand or righthand side (or both)

How many lags to include?
 Pragmatic, bottomup response: Test the coefficients of lagged terms, keep lagging until they become insignificant
 Infinite lags: The Koyck transformation (converting infinite distributed lags of X to an autoregressive specification)
 Polynomial lags: The Almon lag

Kalman filter
Allowing the intercept and slopes adapt over time via Bayesian updating. This eliminates the problem of having your coefficient estimates depend on which particular slice of time your sample represents. Kalman filter models are asymptotically related to ARIMA timeseries estimation.

Instrumental variables
 generic solution to problem of endogeneity: correlation between error term and one or more of the RHS variables. (prob=OLS bias, inconsistent, wrong SEs; seen this in measurement error in Xs a, ommitted vars).
 meet two criteria: (1) they are highly correlated with the endogenous explanatory variable in question, and; (2) have zero correlation with the unobserved factors affecting y (indp of error term in PRF)

proxy variable
a measureable variable that is highly correlated with a second variable that is usually either unobserved or difficult to measure

note about instrumental variables
 diifult to come up with credible instrumental variables.
 IVs are credible in simultaneous equation systems.

Multivariate analysis: simultaneous equation models
 causal arrow diagrams.
 Models in which the dependent variable of one
 model enters as an explanatory variable in another model
 Models in which a exogenous variable affects two
 endogenous variables simultaneously
 Models in which one or more identities link dependent variables across models

difference b/w Simultaeneous equation vs SUR
 Simultaneous: models are linked together via the systematic parts of the model
 SUR: linked via the error term

exogenous (predetermined)
uncorrelated with error terms.
lagged endogenous vars are considered predet

Simultaneity
 induces a correlation between the error term and one or more of the righthand side variables in one or more of the individual structural equations
 OLS a biased and inconsistent estimator
 poses problem of "endogeneity".
 2 problems: identification and estimation

structural model vs reducedform
 There are as many reducedform models as there are endogenous variables in the system. Each
 one of them expresses an endogenous variable as a function of all of the exogenous and predetermined variables in the system.

Identification problem
 The inability to algebraically manipulate reducedform parameter estimates to come up with unique structural parameter estimates.
 Underidentified: no logical way to come up with unique parameter estimates; there are for exactly iden and over
 If you are not concerned with drawing inferences about individual parameters but merely interested in forecasting, there's no problem: the reducedform equations make perfectly good forecasting models.
 An equation w/in a sim system is identified to the extent that the other equations in the system include predet vars. Adding additional predet (exo) vars to an underidentified eq will not help to identify its own parameters.

order condition
way to check identifiabilitity; Kk ??? m1
 K:# of exo vars in entire system
 k: # of exo in eq
 m:# of endo in eq
rank condition

estimation problem
OLS becomes biased & inconsistent; using OLS to estimate the first equation while ignoring in presence of 2nd equation is inapprropriate; system of equations must be est simult.

Indirect Least Squares
 for exactly identified equations in a simult system.
 exactly one way to translate the II coeficients from redform system into the B coeff in structural model

TwoStage Least Squares
 for estimating justidentified and overidentified equations in a system of simult eq.
 key workhorse of simult eq estimation.
 builds directly on and trumps ILS.
 method that yiedls unique parameter estimates for the coeffs even when the eq is overidentified.
 If first stages of 2SLS is a poor fit (low R, t), then system may be identified, but the instrument (pred value of endo RHS var) is weak & OLS possibly inconsistent.

ThreeStage Least Squares
 takes 2SLS one step further.
 Iterative 3SLS is a nonlinear procedure that continues the 3SLS process until the parameter estimates converge to stable values.
 tend to over model sample
 3SLS exploits insight: since the separate models in a simultaneous system operate "simultaneously", it is likely that unobserved effects (errors) are contemporaneously correlated.

LimitedInformation Maximum Likelihood, Full Info....
these MLE approaches make stronger assumptions about dist of RVs than purely sampling theories; assumption of normality; draw inferences are conditional on sample info. LIML=2SLS FIML=3SLS

Simultaneous equation modeling
 a somewhat blunt tool
 imposes more structure than meets the eye (depends heavily on the specification being linear and correctly specified)
 any misspecification in one equation propagates throughout the system

FINAL NOTE
 we began with the notion of (PRF) in which the parameters (the b’s) and explanatory variables (the X’s) were taken to be “fixed’, and only the error term and the dependent variable were random variables.
 we have relaxed the assumption that the PRF parameters must be fixed (they can be cast as random variables in HLM), and permitted the X’s to be random variables (in systems of simultaneous equations).

loglinear
linlog
 Loglin
 normal coeff: 1 unit change in coeff> x slope coefficient by 100get %
 Linlog
 log coeff: 1% change in log coeff> divide slope coeff by 100

