A.2. Anderson  GLMs
Home > Flashcards > Print Preview
The flashcards below were created by user
EExam8
on
FreezingBlue Flashcards. What would you like to do?

Modeling techniques
 oneway analysis: simple to calculate, intuitive, but hides impact of correlated variables (age vs vehicle age) and interdependencies between variables in the way they impact modeled variable (malefemale vs age)
 minimum bias method: adjusts for correlated variables, but unable to test significance of variables, get a range for the parameters, or assess quality of the model
 classical linear models: adjusts for correlated variables, has statistical basis but has restrictive assumptions about distribution of dependent variables
 generalized linear models: same as linear models but with less restrictions, but still more complex and difficult to explain results

Linear models
 express the relationship between observed response variable Y and covariates X
 Y = µ + ɛ, where µ = X β, ɛ is normally distributed (0, σ^{2})

Components of a linear model
 (LM 1) random component: each component of Y is independent and normally distributed with common variance σ^{2}
 (LM 2) systematic component: the p covariates are combined to give the linear predictor η = X β
 (LM 3) link function: relationship between random & systematic component (1 for linear)

Limitations of linear models
 difficult to assert normality and constant variance for response variables
 values for the response variable may be restricted to be > 0 (can't be normally distributed)
 if response variable is > 0 then variance of Y → 0 as E(Y) → 0
 model can't work with multiplicative predictors

Define GLM
Statistical method to measure the relationship that a function of a linear combination of one or more explanatory variables has on a single dependent random variable that is assumed to come from the exponential family of distributions.

Benefits of GLMs
 statistical framework allows for explicit assumptions about the nature of the data and its relationship with predictive variables
 method of solving GLMs is more technically efficient than iterative methods
 GLMs provide statistical diagnostics which aid in selecting only significant variables and validating model assumptions
 adjusts for correlations between variables and allows for interaction effects

Components of a generalized linear model
 from linear models, remove assumptions of normality, constant variance, additivity
 (GLM 1) random component: each component of Y is independent and from one of the exponential family of distributions
 (GLM 2) systematic component: the p covariates are combined to give the linear predictor η = X β
 (GLM 3) link function: relationship between random and systematic components is specified via link function g that’s differentiable and monotonic such that E[Y] = µ = g^{1}(η)

Properties of members of the exponential families
 distribution is completely specified by mean and variance
 the variance of Y_{i} is a function of its mean, with Var(Y_{i}) = 𝜙V(µ_{i}) / ω_{i}
 ω is the prior weight, usually exposure so the model is more responsive to credible data

Common Exponential Family Variance Function
 Normal: V(x) = 1
 Poisson: V(x) = x
 Gamma: V(x) = x^{2}
 Inverse Gaussian: V(x) = x^{3}
 Binomial: V(x) = x (1x)
 Normal assumes each observation is attracted to the original data point with equal weight (all observations have same weight)
 Poisson, Gamma variable function assumes the variance increases with the expected value (gives less weight to observations with high expected values)

Tweedie distribution
 special case of V(x) = (1/λ) x^{p}, where p<0 or 1<p<2 or p>2
 good for pure premium, allows a large point mass at zero (many policies have no claims)
 for 1<p<2, ≈ compound distribution of Poisson (frequency) and Gamma (severity)

Solving a CLM
 write the general equation with Y, β, X, ɛ
 write actual equations for each i
 solve each equation for ɛ_{i}
 define SSE equation = ∑ ɛ_{i}^{2}
 minimize SSE (set derivative with respect to β_{i} equal to 0)
 solve for values of β

Solving a GLM
 identify the likelihood function
 take the log to turn the product of several items into a sum
 maximize the log of the likelihood function and set the partial derivatives for each parameter to zero
 solve the resulting system of equations
 compute the predicted values

The scale parameter ϕ
 for Poisson, ϕ = 1
 for other distributions, ϕ not known (estimated from data)
 it is not necessary to know ϕ to estimate β’s, but it’s used for statistical assessments (SSE)

Methods to evaluate 𝜙
 maximum likelihood: not feasible in practice (no explicit formula)
 Pearson 𝝌^{2} statistic: 𝜙(hat) = 1/(n  p) ∑ [ ω_{i }(Y_{i}  μ_{i})^{2} / V(μ_{i}) ]
 total deviance estimator: 𝜙(hat) = D / (n  p), D given or equal to SSE in classical LM

Link functions
 LM / GLM requires Y / g(Y) to be additive
 link function must be differentiable and monotonic (strictly increasing or decreasing)
 Identity g(x) = g^{1}(x) = x. Used for classical linear model
 Log g(x) = ln(x); g^{1}(x) = e^{x}. Common as it makes everything multiplicative
 Logit g(x) = ln(x/(1x)); g^{1}(x) = e^{x}/(1+e^{x}); Used for retention or 0 < probability < 1
 Reciprocal g(x) = g^{1}(x) = 1/x

Offset term
 used to fix the impact of an explanatory variable
 offset term is known (not estimated), and different for each observation
 common use is when fitting a multiplicative GLM to an observed number/count, use ƹ_{i} = ln(x_{i})
 η = X β + ƹ

Typical GLM models for insurance
 frequency: multiplicative Poisson (Log link function, Poisson error term); invariant to time, i.e. frequency by month/year is the same. ω_{i} is usually set as the exposure (or offset = log of exposure for claim count)
 severity: multiplicative Gamma (Log link function, Gamma error term); invariant to currency
 pure premium: Tweedie (compound of Poisson and Gamma)
 probability (e.g. retention): Logistic (Logit link and binomial error term)

GLM Summary
 μ_{i} = E[Y_{i}] = g^{1}( ∑X_{ij} β_{j} + ƹ_{i})
 Var[Y_{i}] = 𝜙V(μ_{i}) / ω_{i}
 Y_{i} is the vector of responses
 g(x) is the link function; relates expected response to linear combination of observed factors
 X_{ij} is the design matrix produced from the factors
 β_{i} is a vector of model parameters (estimated)
 ƹ_{i} is a vector of known effects or "offsets"
 𝜙 is a parameter to scale the variance function V(x)
 ω_{i} is the prior weight assigning a credibility to each observation

Aliasing
 aliasing: linear dependency among observed covariates resulting in a model that's not uniquely defined
 intrinsic: dependencies inherent in the X_{i} definition (e.g. can only be in 1 group at a time)
 extrinsic: dependency resulting from the nature of the data (e.g. same unknowns)
 near aliasing: occurs when factors are almost but not quite correlated

Correcting for aliasing
 aliasing: remove covariates that create a linear dependency until there are fewer parameters than unique levels of the variables
 nearaliasing: reclassify or delete problematic observations

Model diagnostics (testing variables in the model)
 standard error / size of confidence interval: visually analyzing speed with which loglikelihood falls from the maximum given a change in parameter
 deviance / type III test: create 2 models, with and without parameters to be tested
 𝝌^{2} test: D_{1}  D_{2} compared to the 𝝌^{2} distribution with df_{1}  df_{2} d.f. (larger ✓)
 Ftest: (D_{1}  D_{2}) / [(df_{1}  df_{2})D_{2} / df_{2}] compare to F with df_{1}  df_{2}, df_{2} d.f. (larger ✓)
 testing parameter consistency over time
 intuition: whether we expect a factor to impact the results