Math 1040 Chapter 3

Card Set Information

Math 1040 Chapter 3
2015-02-09 20:09:40
Alia alia Maw maw math 1040 slcc statistics

Flashcards for chapter 3 key terms and principles
Show Answers:

  1. The relation between the ____, ______, and ________ are guidelines. These guidelines tend to hold up well for __________ data, but when the data are ________ the rules can be easily ________.
    mean, median, skewness, continuous, discrete, violated
  2. What is the relationship between the mean, median, and distribution shape?
    • skewed left = Mean substantially smaller than median
    • symmetric = Mean roughly equal to median
    • skewed right = Mean substantially larger than median
  3. What does mu (mew) represent?
    a population parameter. it is computed using data from all the individuals in a population
  4. What does x bar represent?
    a sample statistic. it is computed using data from individuals in a sample.
  5. how is μ (mu) calculated? (hint: Greek letters are used for parameters)
    μ=, or 
  6. How is x-bar calculated? (Hint: Roman letters are used for statistics)
    x-bar=, or 
  7. what is the mean?
    the sum of all the values of the variable in a data set divided by the number of observations

    Important: used when the data are quantitative and the frequency distribution is roughly symmetric.Trimmed means are typically resistant.
  8. what is the median?
    the observation in the middle of a set of observed values of the variable; it is determined once the data have been arranged in ascending order.

    an odd number of observations indicates that the median is the one observation that falls in the middle of the arranged set (or

    an even number of observations indicates that the median is the mean of the two central variables (or the mean of variable  and

    when one of the middle numbers needed to find the median is missing (data set contains an even number of value), find the missing middle number using the formula,  where xm1 is the first middle value and xm2 is the second middle value.

    Important: used when the data are quantitative and the frequency distribution is skewed left or right
  9. when the ____ and ______ are close together in value, we use the ____ as the measure of central tendency. when the data set is ______ the ______ is the preferred measure of central tendency.
    mean, median, mean, skewed, median
  10. what is the mode?
    the observation of the variable that occurs most frequently in the data set (Note: the mode can be computed for either quantitative or qualitative data).

    when no observation occurs more than once, there will be no mode.

    Important: use when the most frequent observation is the desired measure of central tendency or when the data are qualitative.
  11. define bimodal & multimodal?
    • bimodal = the presence of two modes in a data set
    • multimodal = the presence of three or more modes in a data set

    The mode is usually not reported for multimodal data because it is not representative of a typical value.
  12. 3.2
    Measures of _______ ________ describe the typical _____ of the ________.
    central tendency, value, variable
  13. 3.2
    Define dispersion. How many numerical measures of dispersion are there? What is each called?

    Then, complete the sentence:

    We determine numerical measures of __________ to ________ the ______ of data.
    dispersion is the degree to which the data are spread out (also called spread). There are three numerical measures of dispersion: the range, the standard deviation, and variance.

    dispersion, quantify, spread
  14. 3.2
    The range of a variable is...

    T/F The range is resistant
    the difference between the largest and the smallest data value.

    R = largest data value - smallest data value

    F, the range is NOT resistant (an extreme value really has an impact on its calculation)
  15. 3.2
    The standard deviation (σ) of a variable is...
    ...the square root of the sum of squared deviations about the population mean divided by the number of observations in the population, (N).


    the square root of the average of the squared deviations about the population mean.

  16. 3.2
    If a data set has (many/few) observations that are "(near/far)" from the mean, then the (square root/sum) of the squared deviations will be (small/large) and the standard deviation will be (small/large).
    many, far, sum, large, large
  17. 3.2
    The sample standard deviation is...
    ...the square root of the sum of squared deviations about the sample mean divided by n-1, (n = sample size).

  18. 3.2
    Why use n-1 in sample standard deviation? What is "n-1" (degrees of freedom)?
    we divide by "n-1" in a sample because we already know that the sum of the deviation about the mean, , must equal zero. If we know the average and the first "n-1" observations are known, then the "n" observation has to be the value that causes the deviations' sum to equal zero.

    n-1 is the degrees of freedom because the first n-1 observations have freedom to be any value, but the nth observation has no freedom. It must be whatever value forces the sum of the deviations about the mean to be zero.
  19. when computing sample standard deviation, be sure to use x-bar with as (many/few) decimal places as possible to avoid round-off error. However, report the standard deviation to (one/two) more decimal place(s) than the original data
    many, one
  20. Is standard deviation resistant? Why?
    No, because one extreme value has a huge impact on the value of the standard deviation.
  21. the ____ and the ________ _________ are used ________. the ____ measures the ______ of the data distribution and the ________ _________ measures the ______.

    The (greater/lesser) the standard deviation, the (greater/lesser) the spread of the distribution.

    A standard deviation of (zero, one) suggests that there (is/is not) spread in the data. All the values in the data set are (the same/different).
    mean, standard deviation, together, mean, center, standard deviation, spread

    greater, greater

    zero, is not, the same
  22. 3.2
    What is variance?
    The square of the standard deviation. The population variance is , and the sample variance is . Variance is measured in units squared making it difficult to interpret (e.g., ).
  23. 3.2
    What happens if a sample standard deviation is divided by n instead of n-1?
    Then the sample variance would consistently underestimate the population variance. Whenever a statistic consistently underestimates a parameter, it is said to be biased. (Don't be too concerned about this for this class).
  24. 3.2
    data have a ____________ that is ____-______, the Empirical Rule ___ __ ____ to determine the percentage of data that will lie within k standard deviations of the mean. What is the Empirical Rule?
    distribution, bell-shaped, can be used

    If a distribution is roughly bell shaped, then approximately 68% of the data will lie within 1 standard deviation of the mean. That is, approximately 68% of the data will lie between μ−1σ and μ+1σ. 

    Approximately 95% of the data will lie within 2 standard deviations of the mean. That is, approximately 95% of the data will lie between μ−2σ and μ+2σ.

    Approximately 99.7% of the data will lie within 3 standard deviations of the mean. That is, approximately 99.7% of the data will lie between μ−3σ andμ+3σ.

    The Empirical Rule can also be used on sample data with  in place of μ and s in place of σ.
  25. 3.2
    What is Chebyshev's Inequality?
    In probability theory, Chebyshev's inequality guarantees that in ANY probability distribution, "nearly all" values are close to the mean — the precise statement being that no more than  of the distribution's values can be more than k standard deviations away from the mean.
  26. 3.2
    The coefficient of variation (CV) is...

    What does the CV allow for?
    ...the ratio of the standard deviation to the mean of a data set.

    CV = standard deviation/mean

    The CV allows for a comparison in spread by describing the amount of spread per mean unit.
  27. 3.3 - Approximate the mean of a variable from grouped data

    What is the formula for approximating the population mean?

    •  is the midpoint of the ith class
    •  is the frequency of the ith class
    • n is the number of classes

    recall that the class midpoint is the sum of the consecutive lower class limits divided by 2.
  28. 3.3 - Approximate the mean of a variable from grouped data (e.g., approximate the mean cost of $ spent on pizza from a set of grouped data)

    What is the formula for the sample mean?

    •  is the midpoint of the ith class
    •  is the frequency of the ith class
    • n is the number of classes

    recall that the class midpoint is the sum of consecutive lower class limits divided by 2.

    this formula is exactly the same as the formula for the population mean but it is denoted by 
  29. 3.3
    When data values have different importance, or _______, associated with them, we compute the ________ ____.

    Explain how to compute the weighted mean and give its formula.
    weights, weighted mean

    The weighted mean of a variable is found by multiplying each value of the variable by its corresponding weight, adding these products, and dividing this sum by the sum of the weights.


    •  is the weight of the ith observation
    •  is the value of the ith observation
  30. 3.3
    The procedure for approximating the standard _________ from _______ ____ is _______ to that of finding the mean from _______ ____. Because we do not have access to the original data, the standard deviation is ___________.

    Give the formulas for approximating the population standard deviation and sample standard deviation of a variable from a frequency distribution.
    deviation, grouped data, similar, grouped data, approximate

    Population std. dev.:

    Sample std. dev.:

    •  is the midpoint or value of the ith class
    •  is the frequency of the ith class
    • n is the number of classes
  31. 3.4
    The _______ represents the ________ that a data value is from the ____ in terms of the number of ________ __________.

    what are the z-score formulas for a population and a sample?
    z-score, distance, mean, standard deviations

    z-scores are rounded to the nearest hundredth (unless otherwise specified) and can be negative or positive.

     for a population

     for a sample
  32. 3.4
    The kth percentile , denoted __, of a set of data is a value such that _ percent of the observations are (less than/greater than) or (equal/not equal) to the value.

    Recall that the ______ divides the lower 50% of a data set from the upper 50%. The ______ is a special case of a general concept called the __________.
    , k, less than, equal

    median, percentile, median
  33. 3.4
    The most common ___________ are _________, which divide data sets into (halves/thirds/fourths/fifths), or ____ equal parts.
    percentiles, quartiles, fourths, four
  34. 3.4
    What are the first, second, and third quartiles equal to?
    first is = to the 25th percentile

    second is = to the 50th percentile, which is equal to the median

    third is = to the 75th percentile
  35. 3.4
    List the three steps for finding quartiles:
    1) arrange the data in descending order

    2) determine the median, M, or second quartile

    3) divide the data set into two halves: observations less than M and observations greater than M.  is the median of the bottom half, and  is the median of the top half. Exclude M in these halves.

    These steps will agree with StatCrunch when the number of observations are even, but not when the # of observations are odd.
  36. 3.4
    How does one find the interquartile range? the lower fence? the upper fence? what is the cutoff and what does cutoff mean?

    how does one discern whether a distribution is symmetric or skewed based on quartile information?
    IQR = Q3 - Q1

    • lower fence = Q1 - 1.5(IQR)
    • upper fence = Q3 + 1.5(IQR)

    cutoff is another word for fence. "the cutoff" is the higher fence (according to our homework).

    If the difference between Q1 and Q2 is significantly larger than the difference between Q2 and Q3, then the distribution is skewed left. If the difference between Q2 and Q3 is significantly larger, then the distribution is skewed right.
  37. 3.4
    T or F - The IQR and quartiles are not resistant
    false, the IQR and quartiles are resistant unlike the range, standard deviation, and variance of a data set
  38. 3.4
    If the shape of a distribution is symmetric, use the (mean/median) as your measure of central tendency and the (standard deviation/IQR) as your measure of dispersion.

    If the shape of a distribution is skewed left of right, use the (mean/median) as your measure of central tendency and the (standard deviation/IQR) as your measure of dispersion.
    mean, standard deviation, median, IQR

    When asked to describe the distribution, describe its shape (skewed left, skewed right, or symmetric), its center (mean or median), and its spread (standard deviation or interquartile range).
  39. 3.4
    What is an outlier?
    An extreme outlier
  40. What #s does the five number summary include?
    the minimum (smallest) data value, Q1, Q2 or the Median, Q3, and the maximum (largest) value
  41. Describe drawing a boxplot (the graph used for the five number summary) in five steps.
    Step 1 Determine the lower and upper fences:

    Lower Fence=Q1−1.5(IQR)Upper Fence=Q3+1.5(IQR)where IQR=Q3−Q1

    Step 2 Draw a number line long enough to include the maximum and minimum values. Insert vertical lines at Q1, M, and Q3.

    Enclose these vertical lines in a box.Step 3 Label the lower and upper fences with a temporary mark.

    Step 4 Draw a line from Q1 to the smallest data value that is larger than the lower fence. Draw a line from Q3 to the largest data value that is smaller than the upper fence. These lines are called whiskers.

    Step 5 Plot any data values less than the lower fence or greater than the upper fence as outliers. Outliers are marked with an asterisk (*). Remove the temporary marks labeling the fences.
  42. Judging the shape of a distribution is a (subjective/objective/repetitive) practice.