Statistics Exam 2

Card Set Information

Statistics Exam 2
2014-03-12 20:20:47
Statistics Stats
Chapter 3-4
Show Answers:

    • The arithmetic average: "The balance point" of the distribution; computed by adding all the scores in the distribution and dividing by the number of scores.
    • -Mew: Population mean
    • -M: Sample mean
  2. "Deviation"
    an error; the distance from a score to the mean.

    -The sum of the errors or deviations around the mean is always 0.
  3. Advantage of Mean:
    -when to use Mean?
    • 1. More informative than median & mode
    • -Takes all the observation/scores into account; highest and lowest scores are accounted for
    • -Takes the distance & direction of deviations/errors into account.

    2. More uses than median & mode: Necessary for calculating many inferential statistics.

    3. ONLY Use the mean for unimodal, symmetrical distributions of interval/ratio level data.
  4. Limitations of Mean:
    • 1. Not always possible to calculate a mean (scale); Mean can only be calculated for interval/ratio level data
    • -Need a different measure for nominal or ordinal level data

    • 2. Not always appropriate to use the mean to describe the middle of a distribution (distribution)
    • -Mean is sensitive to extreme values or “outliers”
    • -Mean does not always reflect where the scores “pile up”
    • -Need a different measure for asymmetrical distributions: positive or negative distribution
  5. **When to use the Mean:
    Use the mean for unimodal, symmetrical distributions of interval/ratio level data.
    • -The middle score.
    • -the goal is to find the midpoint "score" of the distribution; it can either equal a score or be the point between two middle scores

    • -Calculate by: Order the scores from least to greatest, find middle score or avg the two middle scores
    • -Divides the distribution exactly in half; 50th percentile
    • -Odd # of scores & no “pileup” or ties at the middle.
  7. Advantage of The Median:
    -which scales can it be used with?
    -what position does it have in skewed distributions
    1. Insensitive to extreme values; Can be used when extreme values distort the mean

    2. The most central, representative value in skewed distributions

    3. Can be calculated when the mean cannot

    4.Can be used with ranks/ordinal scales (as well as interval/ratio data)

    5. Can be used with open-ended distributions; Example: # of siblings (5+ siblings?)
  8. Limitation of The Median:
    -which scale can't be used?
    -informative ?
    • 1. Not as informative as the mean; just cuts the scores in half
    • 2. Takes only the observations/scores around the 50th percentile into account.
    • 3.Provides no information about distances between observations.
    • 4.Fewer uses than the mean; Median is purely descriptive (rather than informative)
    • 5.Not always possible to calculate a median (scale); can only be calculated for ordinal & interval/ratio data
  9. When to use the median:
    Use the median when you cannot calculate a mean or when the distributions of interval/ratio data are skewed by extreme values; ordinal scales, open ended distributions
  10. THE MODE:
    The most frequently occurring score(s); describes where the scores pile up.
  11. Advantages of the Mode:
    -which scales? how does this compare with mean/median?
    1.Simple to find

    2.Can be used with ANY scale of measurement; Median can only be calculated for ordinal & interval/ratio data vs. Mean can only be calculated for interval/ratio data

    3. Can be used to indicate >1 most frequent value; Use to indicate bimodality, multimodality
  12. Limitations of the Mode:
    • 1. Not as informative as the mean or median;
    • -Takes only the most frequently observed X values into account.
    • -Provides no information about distances between observations or the # of observations above/below the mode.

    • 2. Fewer uses than the mean: Mode is purely descriptive.
    • -Need to calculate a mean to use with inferential statistics.
  13. When to use Mode:
    • -Use the mode when you cannot compute a mean or median,
    • -or with the mean/median to describe a bimodal/multimodal distribution

    • Note:
    • Symmetrical: Mean=median=mode
    • Bimodial: mean & median is in the middle

    • -nominal scale,
    • -to get shape of the distribution.
  14. Central Tendency
    A statistical measure that attempts to determine the single value, usually located in the center of a distribution, that is most typical or most representative of the entire set of scores
  15. Describing distributions:
    To describe/summarize a distribution of scores efficiently, you need:A measure of central tendency + a measure of variability.
  16. Which measure of central tendency is most appropriate
    Central tendency & variability measures are “partners.”


    Median-->interquartile range, semi-interquartile range

    Mean--> SS, variance (σ²or s² ), standard deviation (σ or s)

    --> These measures describe distributions & indicate how well individual scores or samples of scores represent the population.
  17. Variability measures used with the mode & median--> Range:
    -Used for what categories?
    -what scales?
    -when is range used?
    Range (For Mode): Based on the distance between the highest & lowest observations on the X scale. Only takes the 2 most extreme observations into account.

    For interval/ratio data, range = highest score - lowest score

    --Range can also be used for ordered categories: Ex:: from “agree” to “disagree strongly,” with modal response = “disagree.”

    --The range is typically used with the mode, when the mean & median are inappropriate or impossible to calculate (but may be reported along with a median or a mean).
  18. Variability measures used with the mean
    -used with what scales?
    -most useful for what kind of distribution?
    -SS, variance & standard deviation are based on distances between each of the scores & the mean on the X scale. All scores are taken into account, as with the mean.

    -Use only with interval/ratio data.

    -Most useful for symmetrical distributions, when the mean is the best measure of central tendency.
  19. Summarizing Deviations: Three Steps!
    • 1. SS
    • 2. Variance
    • 3. Standard Deviation
  20. ==> STEP 1:<==
    SS= The Sum of squared deviations or
  21. ***So, what does SS tell you about the variability of the distribution of scores?
    -Relationship b/w variability and SS?
    -Relationship b/w N and SS
    -Why is ss not good descriptive statistic?
    1.SS summarizes the amount of deviation & is useful for further analyses.

    2. As variability increases (more differences between scores, larger deviations) SS gets larger

    3. Extreme scores farther from μ (population mean) contribute proportionately more to SS because they produce larger deviations

    • 4. As N increases (more squared deviations to sum) SS gets larger. Because SS increases with N, SS is NOT a good descriptive statistic
    • -->You can’t compare SS between groups of different sizes.
  22. STEP 2:  Of Summing Deviations.
  23. **How can you use SS to create a measure that will allow you to compare different-sized groups?
  24. Variance:
    -Is variance affected by N? why or why not?
    -what does variation summarize?
    -why is it not a good descriptive statistic?
    the average squared deviation or “mean squared deviation”: σ²= ss/N

    -Variance is not affected by N, because it is an average of squared deviations

    -Variance summarizes the amount of deviation, allows for comparisons between different-sized groups, & is useful for further analyses.

    -Since variance is a mean of squared deviations, it is not on the same scale as our original variable SO not a good descriptive statistic.
  25. **How can you use σ² to create a descriptive measure of variability on the same scale as the original scores?
    -why do we need standard deviation?
    SS =the sum of the squared deviations = ∑(X-μ)²

    σ² (Population variance) the average squared deviation = SS/N

    What we’d really like to have is a measure of the typical or average deviation from the mean that is NOT based on squared quantities; which is why we need Standard Deviation (takes square root of variance)
    Population vs Sample
    the typical or expected deviation; The average or “expected” distance that scores deviate from the mean.
  27. Population Standard Deviation σ= √σ²
    Sample Standard Deviation: s=√s^2
  28. **Why take the square root of variance? (The result is Standard Deviation)
    Square root of variance “returns” the measure of variability to the original units of measurement.

    This allows you to represent standard deviation as a distance on the X axis.

    This also allows you to make statements about how extreme or unusual an observation is.
  29. Standard Deviation:
    Most useful for? Descriptive or Informative?
    Note: As with the mean, standard deviation is most useful for describing symmetrical distributions.

    Standard deviation is the best descriptive measure of variability around a mean;

    SS & variance are important concepts for understanding & for use in further analyses.
  30. Sample variance & standard deviation
    -why is it important?
    We often want to make statements about population parameters. BUT, much of the time we only have access to sample statistics.

    Our best estimate of μ (population mean) = M (Sample Mean) calculated using sample data.
  31. **Compare Sample & population SS:
    They are the same—the Calculations do NOT change from population to sample.If you use the definitional formula, use the correct mean. This will matter later on…

    Population: use μ (population mean) vs. Sample: M (Sample Mean)
  32. **Comparing sample & population variances: 
    -which yields a larger value? How is this corrected?
    Calculations DO change from population to sample.

    • Population: N is denominator vs. Sample: Uses n-1 as denominator
    • Therefore, Sample formula will always yield a larger value.
    • (n-1) instead of N corrects for bias in calculating s & s².
  33. **When are Sample stats only useful: 
    -unbiased means?
    -what is M an unbiased estimate of?
    Sample statistics are only useful to the extent that they provide unbiased estimates of population parameters;

    --> UNBIASED: One that on average equals the population parameter.

    M (sample mean) is an unbiased estimate of : The average of many sample means = the population mean.
  34. **Why doesn’t the SS/N formula work with sample data?
    SS/N tend to underestimate population variance when using sample data; samples contain less variability than the populations they come from; and the samples don’t reflect the extremes of the population so we underestimate the true variability.

    -so if we add n-1, that makes denominator smaller, making the SS larger in return to account for the underestimation.
  35. **Degrees of Freedom:
    • Samples usually contain less variability than the populations they come from so dividing SS by a smaller number corrects for the tendency to underestimate true population variability.
    • The n-1 correction makes s² & s unbiased estimators of σ² & σ .

    n-1 is also referred to as “degrees of freedom.”: Sample variances have n-1 degrees of freedom—they are calculated from n-1 independent scores.

    The last score is determined by the other scores & by M.
    Provides a quantitative measure of the differences between scores of a distribution and describes the degree to which the scores are spread out or clustered together.

    • Serves two purposes:
    • 1. Describes the distribution; tells whether the scores are clustered close together or are spread out over a large distance.  Usually explains distance between one score to another.

    • 2. Variability measures how well an individual score ( or group of scores) represents the entire distribution.
    • -Different measures of variability: Range, standard deviation, variation
  37. THE RANGE:
    the distances covered by the scores in a distribution  covered by the scores in a distribution, from the smallest score to the largest score.

    - not an accurate description of of variability because it doesn't take all the scores into account.