DAT QR 3 - Probability and Statistics

  1. General formula for Probability.
    • # of ways the outcome of a situation can occur
                     # of possible outcomes
  2. What would the solution be to find the probability that multiple events will occur?
    All you have to do is find the probability of each individual event and then multiply them together (as opposed to a single event; flipping coin once to get tails - a 1:2 (or 1/2) probability).

    e.g. flipping the coin twice - what is the P that the first flip will turn up heads and the second tails?

    • P = 1/2 x 1/2
    •    = 1/4

    Note: when you want to know the P of event A or B, the probabilities must be added. If you want to know the P of events A and B, the probabilities must be multiplied.
  3. What is a set of values that appears the most times?

    - can be multimodal.
  4. What is a set of values that appear exactly in the center of the distribution?

    - there are a equal number of values > and < than the median.

    e.g. {3,3,5,8,11}

    median = 5
  5. What is the correlation coefficient?
    Correlation coefficient r indicates whether two sets of data are associated or correlated.

    • - range from -1.0 to 1.0
    • - the larger the absolute value of r, the stronger the association.

    e.g. given two sets of data X and Y

    • - positive value r indicates as X increases, Y also increases.
    • - negative value for r indicates as X increases, Y decreases (and vice-versa).

    Image Upload
  6. A correlation greater than (a)____ is generally described as strong, while a correlation that is less than (b)___ is generally described as weak.
    a) 0.8

    b) 0.5
  7. What is the term for the dispersion of values around the mean?
    Standard deviation
  8. What is variance?

    What are the 3 steps to calculate variance?
    This is another measure of how far a set of numbers is spread out (how far from the mean).

    - this is also defined as the square of the SD.


    1. Determine the mean (simple average of all the numbers)

    2. For each number, subtract the mean and square the result (the squared difference)

    3. Determine the average of those squared differences.
  9. This indicates the dispersion of values around the mean.
    Standard Deviation (SD)
  10. What are the corresponding percentages within each standard deviation (SD) of which data values fall under?
    1 SD = 68%

    2 SD = 95%

    3 SD = 99%
  11. What is the formula for:

    a) Sample mean?
    b) Population mean?
    a) Image Upload

    b) Image Upload

    Image Upload = the sum of

    Image Upload = the sum of all data values

    n = the # of data items in the sample

    N = the # of data items in the population
  12. What is the formula for:

    a) Sample SD?
    b) Population SD?
    Reminder: SD is a measure of the dispersion of a set of data from its mean.

    a) Image Upload

    b) Image Upload

    x = each value in the sample or population

    n-1 = "degrees of freedom" and is used to compensate for the fact that the more accurate population mean (μ) is usually unknown.

    - there may be a difference between the sample mean (Image Upload) and the population mean (μ, which is usually unknown) --> (Image Upload); the n-1 spreads the sample mean curve and the spread between μ & Image Upload decreases as sample size increases.
  13. What is the Coefficient of Variation (CV)?
    This relates to how the SD relates to the mean.

    - The closer this is to zero, the greater the uniformity of the data (sample mean nearly matches that of the population mean, thus little variation).

    - The closer CV is to 1, the greater the variability of the data.

    Image Upload   s = SD   Image Upload= sample mean
  14. What is a permutation of a set?
    It is an ordered arrangement of the elements in that set.

    - the # of permutations of n objects is n! --> n "factorial"

    n! = (n)(n-1)(n-2)(n-3)...(3)(2)(1)

    0! = 1

    • - the number of possible permutations creating different outcomes.
    • - when the order of outcomes matters.
  15. What is the general permutation formula for calculating permutations of n things taken r at a time?
    • --> nr
    • --> Image Upload

    Note: similar permutations found in both the numerator and denominator may be cancelled, simplifying the permutation --> TIME SAVED!
  16. a) When to use combinations?

    b) What is the basic formula for combinations?
    a) - when order of the outcome does not matter.

    - there are thus fewer combinations than permutations.

    b) Image Upload

  17. Formula for z-test used for single mean?
    Image Upload

    • Image Upload = sample mean
    • Image Upload = hypothesized population mean
    • Image Upload = population SD (given)
    • n = sample size

    • Image Upload = standard error of the Mean (SD of the sampling distribution/distrib. of multiple samples
    •      = Image Upload

    Main question: is this z-test value in the non-rejection region or the in the rejection region?
