What represents the distance of a data value from the mean in terms of standard deviations. z-score How do you calculate the population z-score? z=(x-μ)/σ How do you calculate the sample z-score? What is unitless, has mean = 0, and standard deviation =1 ? z-score The median is a special case of a general concept called the _______. percentile In a set of data, what is a value such that k percent of the observations are less than or equal to the value. kth percentile Define Quartiles. Quartiles divide the data sets into quarters or fourths. The quartiles are the 25th , 50th , and 75th percentiles, where Q1=25th percentile, Q2= 50th percentile, i.e.m the median, and Q3= 75th percentile. Identify the steps to finding quartiles. Step 1. Arrange the data in increasing order. Step 2. Determine the median, M, or second quartile, Q2. Step 3. Divide the data set into halves: the observation below (to the left of) M and the observatin above M. The first quartile, Q1, is the median of the bottom half and the third quartile, Q3, is the median of the top half. Quartiles, on the other hand , are ________ to extreme values resistant What is the range of the middle 50% of the observations in a data set? In other words, it is the difference between the third and first quartile. IQR, The Inner Quartile Range What is the formula for IQR? IQR = Q3 - Q1 What are extreme observations in the data? Outliers Why should outliers be investigated? Outliers should be investigated because outliers could be chance occurrence, measurement errors, data entry errors, or sampling errors. Know that outliers are not necessarily invalid data, but should be recognized. Describe the steps in Checking for Outliers by Using Quartiles. Step 1. Determine the first and third quartiles of the data. Step 2. Compute the interquartile range. The interquartile range or IQR is the difference between Q3 and Q1. Step 3. Determine the fences: The Lower Fence and The Upper Fence. Step 4. Values less than the lower fence or more than the upper fence could be considered outliers. What is the formula for the Lower Fence? LF=Q1-1.5(IQR) What is the formula for the Upper Fence? UF=Q3-1.5(IQR) What is the collection of the smallest value, the first quartile (Q1or P25) , the median (Q2 or P50), the third quartile (Q3or P75), and the largest value? The Five-Number Summary What kind of graph can illustrate the five-number summary? A Boxplot. Describe the distribution: Skewed Right Describe the distribution: Symmetric Describe the distribution: Skewed Left Data for a single variable is called _____. univariate data In a data set, the relations between two variables is called _______. bivariate data What is the variable whose value can be explained by the value of the explanatory or predictor variable? response variable What is a graph that shows the relationship between two quantitative variables? Each individual is represented by a point in the diagram: the explanatory variable, x, is plotted on the horizontal scale and the response variable, y, is plotted on the vertical scale. A Scatter Diagram Explain why it is necessary to show a scatter diagram with the correlation coefficient when claiming that a linear relation exists between two variables. Influential observations can cause the correlation coefficient to increase substantially, thereby increasing the apparent strength of the linear relation between two variables. positive linear association above average values of one variable are associated with the above average values of the other and below average values of one variable are associated with below average values of the other. In other words, two variables are positively associated if,whenever the value of one variable increases, the value of the other variable also increases. negative linear association above average value of one variable are associated with below average values of the other and below average values of one variable are associated with above average values of the other. In other words, two variables are negative associated if whenever the value of one variable increases, the value of the other variable decreases. What is a measure of strength of linear relation between two quantitative variables? linear correlation coefficient r is always between _and _inclusive. r is always between -1 and +1 inclusive. That is −1≤r≤1. If r=+1, then a ____ _____ ____ relation exists between the two variables. If r=+1, then a perfect positive linear relation exists between the two variables. If r=−1, then a ____ _____ _____ relation exists between the two variables. If r=−1, then a perfect negative linear relation exists between the two variables. Positive values of r correspond to evidence of ____ ______ between the two variables. Positive values of r correspond to evidence of positive association between the two variables. Negative values of r correspond to evidence of _____ _______ between the two variables Negative values of r correspond to evidence of negative association between the two variables If r is close to 0, then_____ ___ _____ exits of a linear relation between the two variables. If r is close to 0, then little or no evidence exits of a linear relation between the two variables. T/F:  r close to 0 does not imply no relation, just no linear relation. True r is a _______ measure. r is a unitless measure. r is not _________. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient r is not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient Define r for the following graph: Perfect positive linear relation, r=1 Define r for the following graph: Strong positive linear relation, r ≃ 0.9 Define r for the following graph: Moderate positive linear relation, r ≃ 0.4 Define r for the following graph: Perfect negative linear relation, r = -1 Define r for the following graph: Strong negative linear relation, r ≃ -0.9 Define r for the following graph: Moderate negative linear relation, r ≃ -0.4 Define r for the following graph: No linear relation, r close to 0 Define r for the following graph: No linear relation, r close to 0 T/f: Correlation does not imply causation. T: Just because two variables are correlated does not mean that one causes the other to change. One way that two variables can be related even though there is not a causal relation is through a lurking variable. What is the line that describes the linear relationship best. regression line What is the difference between the observed values and the predicted values. Residuals or errors. least-squares regression line the line that minimizes the sum of the squared errors Identify the equation: Least-squares regression line For , b1 represents... What is the difference between the observed values and the predicted values? Residuals What is the formula for Residuals? Residuals = observed y - predicted y or Residuals = y - y^ What is true about the point that is always contained in the least-squares regression line? Interpreting slope of a regression line: For an additional increase in unit of an explanatroy variable, the response variable will increase or decrease on average by b1. Interpretation of the y-intercept: We interpret the y-intercept as being the value of the response variable when the value of the explanatory variable is 0. Sometimes the y-intercept will not make sense and will not be interpretable, but it is still needed in the model. What is the use of a regression line for predictions outside the range of x values used to obtain the line? Extrapolation Why is it highly advised to only use the regression model to make prediction within the given range of data? Making predictions using values that is outside those observed from the data can be very dangerous in practice. We cannot be certain of the behavior of the data for which we have no observations. Of what is this an example? Extrapolation: the model is being used to predict values outside of the observed data. What does R2 represent? coefficient of determination What measures the percent of total variation in the response variable that is explained by the least-squares regression line? The coefficient of determination R2, which measures how well y^ describes the relationship between the two variables. R2 is close to 0 indicates_______. a model with very little explanatory power R2 is close to 1 indicates_______. a model with much explanatory power Why do we analyze residuals? 1) To determine if the linear model is appropriate 2) To determine whether the variance of the residuals is constant 3) To check for outliers T/F: if a correlation indicates a linear relation exists between two variables does not imply that the relation is linear. True To determine if a linear model is appropriate we need to also draw a __________ ________. To determine if a linear model is appropriate we need to also draw a residual plot. What is a scatter diagram with the residuals on the vertical axis and the explanatory variable on the horizontal axis? Residual Plot T/F: If a plot of the residuals against the explanatory variable show any discernible pattern, such as curve, then the explanatory and the response variable may not be linearly related. True T/F: If residuals are scattered randomly around 1, chances are your data fit a linear model. False: If residuals are scattered randomly around 0, chances are your data fit a linear model. What is constant error variance or homoscedasticity? If a plot of the residuals against the explanatory variable shows the spread of the residuals increasing or decreasing as the explanatory variable increases, then a strict requirement of the linear model is violated. Of what is this an example? Homoscedasticity Of what is this an example: contingency tables or two-way tables. What is the first step in analyzing contingency tables? The first step in analyzing contingency tables is to analyze each of the variables separately. Analyzing therow variables by themselves and analyzing the column variable by themselves. The variables, when analyzed separately, have their marginal distributions. How do we compute the relative frequency marginal distributions? Divide the row marginal frequencies by the grand total to get the row relative frequency marginal distribution and divide the column marginal frequencies by the grand total to get the column relative frequency marginal distribution What lists the relative frequency of each category of a variable, given a specific value of the other variable in the contingency table? A conditional distribution. probability the measure of the likeliness for an event to occur T/F: Probability deals with summarizing data. False. Probability deals with predicted outcomes. Probability relates long-term results to short-term results. False. Probability relates short-term results to long-term results Define: as the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome. The Law of Large Numbers What is a repeatable process where the results are uncertain? experiment What is an outcome? one specific possible result What is the set of all possible outcome? sample space, S What is a collection of possible outcomes? event, E simple events, e Events with one outcome. What are the Rules of Probabilities? (2) 1. The probability of any event, P(E), must be greater or equal to zero and less than or equal to 1, i.e., 0≤P(E)≤1 2. The sum of probabilities of all outcomes must be equal to one. T/F: Probabilities can be written as decimals, percents, and fractions. True If an event is impossible, then its probability must be equal to ____ zero. If an event is a certainty, then its probability must be equal to ____. one. An unusual event is one that has ____ probability of occurring. An unusual event is one that has low probability of occurring, i.e., 5% or less. What is the formula for approximating probabilities using the Empirical Approach? P(E) ~ Relative Frequency of E = (Freq of E) / (# of trials of experiment) If we do not know the probability of a certain event E, we can conduct a series of experiments to approximate it.  This is called ______. Approximating Probabilities Using the Empirical Approach What is the formula for Computing Probabilities Using the Classical Method? P(E) = N(E) / N(S)  | S = sample space T/F: The classical method applies to experiments where all possible outcomes have equally likely outcomes. True What is subjective probability? A subjective probability is a person’s estimate of the chance of an event occurring. This probability is based on personal judgment. T/F: Subjective probabilities should still be between zero and one, but must obey the laws of probability. False. Subjective probabilities should still be between zero and one, but may not obey the laws of probability. An economist predicting there is a 20% chance of recession next year would be of what example? subjective probability