Home > Preview
The flashcards below were created by user
larry.gish89
on FreezingBlue Flashcards.

bimodal distribution
If data have exactly two modes

multimodal distribution
If data have more than two modes

arithmetic mean
 An important property of the arithmetic mean is that the sum of the deviations from the
 mean will always equal 0.
 arithmetic mean is sensitive the to extreme values. We often refer to these extreme values
 (whether small or large) as outliers.

Organization of Frequency and Relative Frequency
Distributions
 Step 1: Determine the number of class intervals (also referred
 to as classes or categories, and meaning ranges of data values)
 of interest. Between 5 and 20 class intervals are generally recommended. Class intervals
 should be selected so that each data point (or individual value of raw data) can fall into
 only one category.
 Step 2: When it is desired that all class intervals be of equal
 width, you determine the width of the class by subtracting the smallest data value from
 the largest and then dividing by the number of class intervals desired.

Weighted Mean
If you don't want to weigh the data points equally, but by their relative importance.
 (95 0.10) +(70 0.10)+ (60 0.10)+ (85 0.30)+ (90 0.40)= 84 (weighted mean)

dispersion or variability

range
... is one measure of dispersion of a data set.
range = largest data value – smallest data value
 The problem with the range is that it includes only two numbers of the data set and
 ignores the rest of the values.

Percentiles
 Percentiles of a ranked data set divide it into hundredths, or 100 equal parts
 of the data values. The median is the 50th percentile. Fifty percent of the data falls
 below the median and 50 percent falls above it.
 Quartiles are the percentiles that divide the data into quarters (or fourths).
 There are three quartiles, then, at the 25th, 50th, and 75th percentiles. We often refer
 to these as Q1, Q2, and Q3, respectively.

How we calculate percentiles:
 1. Arrange the data in ascending order from the smallest value to the largest.
 2. Compute the index i:
 where

i is the position number of the percentile you're interested in
 p is the percentile you're interested in knowing
 n is the number of items in the data set
 3. If i is not an integer, round up to the nearest integer. The next integer value
 greater than i
 denotes the position of the pth percentile.
 If i is an integer, the pth percentile is the average of the data values
 in positions i and i + 1.

Interquartile Range
 The interquartile range is the 75th percentile minus the 25th percentile, or Q3 –
 Q1. This range has less dependency on outliers than does the range previously discussed.

variance of a data set
 is an important measure of dispersion
 within a data set because it takes into account all the data values

The variance of the population
 is the average of the squared deviations from the
 arithmetic mean. When you take the variance of a sample, you divide the squared deviations
 from the sample mean by the sample size minus 1. Doing this generally gives a better
 estimate of the population variance from which the sample comes.

denote the variance of a population with...
"little sigma squared."

We denote the sample variance
s 2 (squared)

The population mean and the population variance are called
 parameters of a
 population because they are quantities that are fixed for any given population.

We
call the sample mean and the sample variance
 sample statistics (or random
 variables) because they vary from one sample to another, inasmuch as their values depend
 on which sample is selected.

Use the following steps to calculate the sample variance:
 1. Calculate the sample mean.
 2. Calculate the difference between each observation and the sample mean.
 3. Square each difference found in step 2.
 4. Sum the squared differences found in step 3.
5. Divide the sum of the squared differences by the sample size minus one, n – 1.

Frequently, we use the ________ _________ instead of the variance to
describe dispersion.
 the standard deviation. You get the standard deviation by taking the square root of the
 variance. ( sample variance = s squared/population variance = "little sigma squared")
 The advantage of using the standard deviation is that it has the same units of
 measurement as the data values.

We
represent this statistic with s, meaning "the square root of the
variance, s2 (squared)."
This is the representation for standard deviation.

What does the standard deviation actually mean?
 The standard deviation shows how the
 data points are distributed or dispersed about the sample mean. When the things you are
 measuring are alike, such as test scores from the same class, the bigger the standard
 deviation, the more dispersion you have about the mean.

Coefficient of Variation
 When two (or more) distributions have the same mean, the one with the largest standard
 deviation has the most variation. But what about when distributions have different means?
 In that case, you can't compare just the standard deviations. Instead, you have to compare
 the coefficient of variation (CV) for each distribution as well. The distribution with the
 highest CV has the most dispersion.

Empirical Rule
 This rule applies to data that are approximately normally distributed, that is, a
 bellshaped symmetrical distribution. About 68 percent of the data points will fall within
 one standard deviation of the mean, and about 95 percent of the data points will be within
 two standard deviations of the mean.
 For example, let's continue with our inquiry into salaries but with a different
 profession. Let's take a sample of the salaries of 150 production workers. Here's a
 distribution of salaries we might find.

Chebyshev's Theorem
 tells us
 the minimum proportion of data points that lie within any number of standard deviations
 from the mean, regardless of the shape of the distribution. Chebyshev's theorem states:
 At least of
 the measurements fall within k standard deviations from the mean.
 Note: k must be greater than 1.
 For example, if you want to find out the minimum percentage of the data values that are
 within 2 standard deviations from the mean, you'd calculate:
 That is, for any data set, at least 75 percent of the data values are within two
 standard deviations from the mean.
 If you calculate the minimum percentage of values are between the mean and three
 standard deviations from the mean, you'll get an answer of "at least 89
 percent."
 Although Chebyshev's theorem provides us only with lower bounds for the percentage of
 data values that lie within k (where k >1) standard deviations from the
 mean, it doesn't provide us with exact percentages. The power of Chebyshev's theorem lies
 in the fact that it is true for any distribution, regardless of its shape.

The ____ _ ______ compares the standard deviation relative
to the mean of the distribution. For this reason, the CV is also known as the ______ ______ _____ (RSE).
 coefficient of variation;
 relative standard error

Here's how we calculate the CV...
 Think of the CV for any variable as the precision of the mean for that variable. Many
 federal agencies, such as the National Center for Health Statistics (NCHS), use the CV as
 a measure of the precision or reliability of estimates of health characteristics. The
 smaller the CV, the more reliable (precise) the estimate is. The larger the CV, the more
 unreliable it is.

Shapes of distributions
 1. Symmetrical distributions Has the same center value for the mean, median, and mode.
 (mirrored appearance)
2. Uniform of Rectangular Distribution Every class has the same frequency.
3. Skewed Distribution One "tail" is longer than the other.
 If the longer tail is on the left, we say that the distribution is skewed to the
 left, or negatively skewed. If the longer tail is to the right, we say the
 distribution is skewed to the right, or positively skewed.
 4. Bimodal Distribution A bimodal distribution refers to a histogram in which two classes with largest
 frequencies are separated by at least one class, and the top two frequencies of these
 classes may have different values.

