The flashcards below were created by user
lazvertiigo
on FreezingBlue Flashcards.

What represents the distance of a data value from the mean in terms of standard deviations.
zscore

How do you calculate the population zscore?
z=(xμ)/σ

How do you calculate the sample zscore?

What is unitless, has mean = 0, and standard deviation =1 ?
zscore

The median is a special case of a general concept called the _______.
percentile

In a set of data, what is a value such that k percent of the observations are less than or equal to the value.
kth percentile

Define Quartiles.
Quartiles divide the data sets into quarters or fourths. The quartiles are the 25^{th} , 50^{th} , and 75th percentiles, where Q_{1}=25^{th} percentile, Q_{2}= 50^{th} percentile, i.e.m the median, and Q_{3}= 75th percentile.

Identify the steps to finding quartiles.
Step 1. Arrange the data in increasing order.
Step 2. Determine the median, M, or second quartile, Q_{2}.
Step 3. Divide the data set into halves: the observation below (to the left of) M and the observatin above M. The first quartile, Q_{1}, is the median of the bottom half and the third quartile, Q_{3}, is the median of the top half.

Quartiles, on the other hand , are ________ to extreme values
resistant

What is the range of the middle 50% of the observations in a data set? In other words, it is the difference between the third and first quartile.
IQR, The Inner Quartile Range

What is the formula for IQR?
IQR = Q_{3}  Q_{1}

What are extreme observations in the data?
Outliers

Why should outliers be investigated?
Outliers should be investigated because outliers could be chance occurrence, measurement errors, data entry errors, or sampling errors. Know that outliers are not necessarily invalid data, but should be recognized.

Describe the steps in Checking for Outliers by Using Quartiles.
Step 1. Determine the first and third quartiles of the data.
Step 2. Compute the interquartile range. The interquartile range or IQR is the difference between Q3 and Q1.
Step 3. Determine the fences: The Lower Fence and The Upper Fence.
Step 4. Values less than the lower fence or more than the upper fence could be considered outliers.

What is the formula for the Lower Fence?
LF=Q_{1}1.5(IQR)

What is the formula for the Upper Fence?
UF=Q31.5(IQR)

What is the collection of the smallest value, the first quartile (Q_{1}or P_{25}) , the median (Q_{2} or P_{50}), the third quartile (Q_{3}or P_{75}), and the largest value?
The FiveNumber Summary

What kind of graph can illustrate the fivenumber summary?
A Boxplot.

Describe the distribution:
Skewed Right

Describe the distribution:
Symmetric

Describe the distribution:
Skewed Left

Data for a single variable is called _____.
univariate data

In a data set, the relations between two variables is called _______.
bivariate data

What is the variable whose value can be explained by the value of the explanatory or predictor variable?
response variable

What is a graph that shows the relationship between two quantitative variables?
Each individual is represented by a point in the diagram: the explanatory variable, x, is plotted on the horizontal scale and the response variable, y, is plotted on the vertical scale.
A Scatter Diagram

Explain why it is necessary to show a scatter diagram with the correlation coefficient when claiming that a linear relation exists between two variables.
Influential observations can cause the correlation coefficient to increase substantially, thereby increasing the apparent strength of the linear relation between two variables.

positive linear association
above average values of one variable are associated with the above average values of the other and below average values of one variable are associated with below average values of the other. In other words, two variables are positively associated if,whenever the value of one variable increases, the value of the other variable also increases.

negative linear association
above average value of one variable are associated with below average values of the other and below average values of one variable are associated with above average values of the other. In other words, two variables are negative associated if whenever the value of one variable increases, the value of the other variable decreases.

What is a measure of strength of linear relation between two quantitative variables?
linear correlation coefficient

r is always between _and _inclusive.
r is always between 1 and +1 inclusive. That is −1≤r≤1.

If r=+1, then a ____ _____ ____ relation exists between the two variables.
If r=+1, then a perfect positive linear relation exists between the two variables.

If r=−1, then a ____ _____ _____ relation exists between the two variables.
If r=−1, then a perfect negative linear relation exists between the two variables.

Positive values of r correspond to evidence of ____ ______ between the two variables.
Positive values of r correspond to evidence of positive association between the two variables.

Negative values of r correspond to evidence of _____ _______ between the two variables
Negative values of r correspond to evidence of negative association between the two variables

If r is close to 0, then_____ ___ _____ exits of a linear relation between the two variables.
If r is close to 0, then little or no evidence exits of a linear relation between the two variables.

T/F: r close to 0 does not imply no relation, just no linear relation.
True

r is a _______ measure.
r is a unitless measure.

r is not _________. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient
r is not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient

Define r for the following graph:
Perfect positive linear relation, r=1

Define r for the following graph:
Strong positive linear relation, r ≃ 0.9

Define r for the following graph:
Moderate positive linear relation, r ≃ 0.4

Define r for the following graph:
Perfect negative linear relation, r = 1

Define r for the following graph:
Strong negative linear relation, r ≃ 0.9

Define r for the following graph:
Moderate negative linear relation, r ≃ 0.4

Define r for the following graph:
No linear relation, r close to 0

Define r for the following graph:
No linear relation, r close to 0

T/f: Correlation does not imply causation.
T: Just because two variables are correlated does not mean that one causes the other to change. One way that two variables can be related even though there is not a causal relation is through a lurking variable.

What is the line that describes the linear relationship best.
regression line

What is the difference between the observed values and the predicted values.
Residuals or errors.

leastsquares regression line
the line that minimizes the sum of the squared errors

Identify the equation:
Leastsquares regression line

For , b _{1 }represents...

What is the difference between the observed values and the predicted values?
Residuals

What is the formula for Residuals?
 Residuals = observed y  predicted y
 or Residuals = y  y^

What is true about the point that is always contained in the leastsquares regression line?

Interpreting slope of a regression line:
For an additional increase in unit of an explanatroy variable, the response variable will increase or decrease on average by b_{1}.

Interpretation of the yintercept:
We interpret the yintercept as being the value of the response variable when the value of the explanatory variable is 0. Sometimes the yintercept will not make sense and will not be interpretable, but it is still needed in the model.

What is the use of a regression line for predictions outside the range of x values used to obtain the line?
Extrapolation

Why is it highly advised to only use the regression model to make prediction within the given range of data?
Making predictions using values that is outside those observed from the data can be very dangerous in practice. We cannot be certain of the behavior of the data for which we have no observations.

Of what is this an example?
Extrapolation: the model is being used to predict values outside of the observed data.

What does R^{2} represent?
coefficient of determination

What measures the percent of total variation in the response variable that is explained by the leastsquares regression line?
The coefficient of determination R^{2}, which measures how well y^ describes the relationship between the two variables.

R^{2} is close to 0 indicates_______.
a model with very little explanatory power

R^{2} is close to 1 indicates_______.
a model with much explanatory power

Why do we analyze residuals?
 1) To determine if the linear model is appropriate
 2) To determine whether the variance of the residuals is constant
 3) To check for outliers

T/F: if a correlation indicates a linear relation exists between two variables does not imply that the relation is linear.
True

To determine if a linear model is appropriate we need to also draw a __________ ________.
To determine if a linear model is appropriate we need to also draw a residual plot.

What is a scatter diagram with the residuals on the vertical axis and the explanatory variable on the horizontal axis?
Residual Plot

T/F: If a plot of the residuals against the explanatory variable show any discernible pattern, such as curve, then the explanatory and the response variable may not be linearly related.
True

T/F: If residuals are scattered randomly around 1, chances are your data fit a linear model.
 False:
 If residuals are scattered randomly around 0, chances are your data fit a linear model.

What is constant error variance or homoscedasticity?
If a plot of the residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable increases, then a strict requirement of the linear model is violated.

Of what is this an example?
Homoscedasticity

Of what is this an example:
contingency tables or twoway tables.

What is the first step in analyzing contingency tables?
The first step in analyzing contingency tables is to analyze each of the variables separately. Analyzing therow variables by themselves and analyzing the column variable by themselves. The variables, when analyzed separately, have their marginal distributions.

How do we compute the relative frequency marginal distributions?
Divide the row marginal frequencies by the grand total to get the row relative frequency marginal distribution and divide the column marginal frequencies by the grand total to get the column relative frequency marginal distribution

What lists the relative frequency of each category of a variable, given a specific value of the other variable in the contingency table?
A conditional distribution.

probability
the measure of the likeliness for an event to occur

T/F: Probability deals with summarizing data.
 False.
 Probability deals with predicted outcomes.

Probability relates longterm results to shortterm results.
 False.
 Probability relates shortterm results to longterm results

Define: as the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome.
The Law of Large Numbers

What is a repeatable process where the results are uncertain?
experiment

What is an outcome?
one specific possible result

What is the set of all possible outcome?
sample space, S

What is a collection of possible outcomes?
event, E

simple events, e
Events with one outcome.

What are the Rules of Probabilities?
(2)
1. The probability of any event, P(E), must be greater or equal to zero and less than or equal to 1, i.e., 0≤P(E)≤1
2. The sum of probabilities of all outcomes must be equal to one.

T/F: Probabilities can be written as decimals, percents, and fractions.
True

If an event is impossible, then its probability must be equal to ____
zero.

If an event is a certainty, then its probability must be equal to ____.
one.

An unusual event is one that has ____ probability of occurring.
An unusual event is one that has low probability of occurring, i.e., 5% or less.

What is the formula for approximating probabilities using the Empirical Approach?
P(E) ~ Relative Frequency of E = (Freq of E) / (# of trials of experiment)

If we do not know the probability of a certain event E, we can conduct a series of experiments to approximate it. This is called ______.
Approximating Probabilities Using the Empirical Approach

What is the formula for Computing Probabilities Using the Classical Method?
P(E) = N(E) / N(S)  S = sample space

T/F: The classical method applies to experiments where all possible outcomes have equally likely outcomes.
True

What is subjective probability?
A subjective probability is a person’s estimate of the chance of an event occurring. This probability is based on personal judgment.

T/F: Subjective probabilities should still be between zero and one, but must obey the laws of probability.
 False.
 Subjective probabilities should still be between zero and one, but may not obey the laws of probability.

An economist predicting there is a 20% chance of recession next year would be of what example?
subjective probability

