is a sample statistic U used to provide a one-point approximation to the value of an unknown parameter θ.
Sampling Error
is given by the absolute value of the difference between the estimated value and true value of the parameter of interest.
| U-θ |
Geometrically: Sampling error measures the distance between the two points U and θ.
Unbiased Estimator
U attempting to estimate θ such that E(U) = θ.
Bias
E(U) - θ
More efficient
Suppose U and W are two unbiased estimators for θ. Then U is ___________ provided that it has a smaller variance.
Relative Efficiency
of U to W:
100[V(W)/V(U)]
Confidence Level
the percentage 100(1-α)%
Confidence Interval
Sample Mean ± Critical Value * Standard Error
Margin of Error
limit on the size of sampling error.
P(sampling error<MOE)= 1-α
Prototype Confidence Interval for the Mean- Assumptions
Normal Population
VSRS
σ is known
Prototype Confidence Interval for the Mean- Calculation
xbar ± Z_{α/2 }(σ/√n)
Interpretation of Confidence Level
Under repeated random sampling, the statistical procedure yeilds an interval containing the true value Θ of the parameter of interest 100(1-α)% of the time.
Large Sample Confidence Interval- Assumptions
VSRS
n is sufficiently large (ie: n is greater than 30)
σ is unknown
Large Sample Confidence Interval- Calculations
xbar ± Z_{α/2} (s/√n)
T-Distributions appear to be bell-shaped but actually...
have fatter tails than normal curves
Variance of a T-Distribution
df/(df-2)
is greater than 1 but convergys to 1 as df-->infinity
T-Distribution congergys to...
standard normal distribution as df-->infinity
Small Sample Confidence Interval- Assumptions
VSRS
normal population
σ is unknown
Small Sample Confidence Interval for the Mean- Calculations
xbar ± t_{α/2} (s/√n)
Large Sample Confidence Interval for the Proportion- Assumptions
Binomial Population
VSRS
n is sufficiently large (np≥5, n(1-p)≥5)
Large Sample Confidence Interval for the Proportion- Calculation
pbar ± Z_{α/2} (√pbar(1-pbar)/n)
Null Hypothesis
statement about the population of interest.
Alternative Hypothesis
statement about the populatin of interest that contradicts the null hypothesis.
Test Statistic
sample statistic that is used to summarize the evidence.
Significance
probability of making an error in judgement.
Fisher's Theory
(Heuristic) The P-value is the probability of observing a value of the test statistic "at least as extreme as" what was actually observed.
-focuses on type I error (false positive)
Interpreting P-Values
the P-velue is computed by temporarily assuming the null hypothesis is true.
a low P-value casts doubt upon the assumption that the null hypothesis is true. Thus, we reject it.
a high P-value is consistent with the assumption that the null hypothesis is true. Thus, we do not reject it.
Neyman-Pearson Theory
power calculations
the p-value is the smallest significance level for which the null hypothesis can be rejected.
Type I Error
false positive
Type II Error
false negative
Decision Rule
specifies the set of values of the test statistic for which the null hypothesis is rejected in favor of the alternative hypothesis and the set of values for which the null hypothesis is accepted.
Rejection Region
consists of all values of the test statistic for which the null hypothesis is rejected.
Acceptance Region
consists of all values of the test statistic for which the null hypothesis is accepted.
Critical Value
value that separates the rejection region from the acceptance region.
Power
1-B where B is the probability of making type II error
Large Sample Hypothesis Test for the Proportion- Assumptions
Binomial Population
VSRS
n is sufficiently large (based on null hypothesis value of p)
Large Sample Confidence Interval for the Proportion- Calculations
z=pbar-pnot/(√pnot(1-pnot)/n)
P-Value
probability of observing a value of the test statistic "at least as extreme as" what was actually observed.
the p-value is the smallest significance level for which the null hypothesis can be rejected.
**p-value is the observed significance level**
Regression Statistics- Multiple R
positive square root of R Square. Large value indicates a good fit.
Regression Statistics-R Square
number indicates that ___% of the variation in the response variable Y is explained by its linear relationship with the explanatory variable X.
Regression Statistics-Adjusted R Square
adjustement of R Square for degrees of freedom. The adjustment serves as a penalty for using too many explanatory variables.
Regression Statistics-Standard Error
"Standard Error of Regression" and estimates the standard deviation of the error specification in the model. A relatively small standard error is another indication of a good fit.
ANOVA- First Column
"Regression" always refers to the part that is explained and "Residual" to the part that is unexplained. The "Total" is typically the sum of the explained and unexplained components, wherever this way of thinking is useful.
ANOVA- Second Column
analyzes degrees of freedom. The total degrees of freedom is always n-1. If there is only one explanatory variable, Regression df=1. The Residual degrees of freedom is what is left over (ie: 19-1=18).
ANOVA- Third Column
analyzes the variation in the response variable in terms of Sum of Squares. The Total Sum of Squares is simply the sum of the squared deviations from the sample mean for Y. The Regression SS is the variation in the response variable Y that is explained byt he regression line. Residual SS is interpreted as the variation in the response variable Y that is not explained by the linear regression line.
ANOVA- Fourth Column
Mean Square is a Sum of Squares divided by the appropriate degrees of freedom. It is easy to check that the square root of the Residual MS is equal to the Standard Error of Regression listed in the Regression Statistics.
ANOVA- Fifth Column
F-Statistic is the ratio of MS Regression to MS Residual. Large value of F supports the alternative hypothesis that the linear regression model fits the data against the null hypothesis that the linear regression model does not fit the data.