Home > Flashcards > Print Preview
The flashcards below were created by user
on FreezingBlue Flashcards. What would you like to do?
If data have exactly two modes
If data have more than two modes
- An important property of the arithmetic mean is that the sum of the deviations from the
- mean will always equal 0.
- arithmetic mean is sensitive the to extreme values. We often refer to these extreme values
- (whether small or large) as outliers.
Organization of Frequency and Relative Frequency
- Step 1: Determine the number of class intervals (also referred
- to as classes or categories, and meaning ranges of data values)
- of interest. Between 5 and 20 class intervals are generally recommended. Class intervals
- should be selected so that each data point (or individual value of raw data) can fall into
- only one category.
- Step 2: When it is desired that all class intervals be of equal
- width, you determine the width of the class by subtracting the smallest data value from
- the largest and then dividing by the number of class intervals desired.
If you don't want to weigh the data points equally, but by their relative importance.
- (95 0.10) +(70 0.10)+ (60 0.10)+ (85 0.30)+ (90 0.40)= 84 (weighted mean)
dispersion or variability
... is one measure of dispersion of a data set.
range = largest data value – smallest data value
- The problem with the range is that it includes only two numbers of the data set and
- ignores the rest of the values.
- Percentiles of a ranked data set divide it into hundredths, or 100 equal parts
- of the data values. The median is the 50th percentile. Fifty percent of the data falls
- below the median and 50 percent falls above it.
- Quartiles are the percentiles that divide the data into quarters (or fourths).
- There are three quartiles, then, at the 25th, 50th, and 75th percentiles. We often refer
- to these as Q1, Q2, and Q3, respectively.
How we calculate percentiles:
- 1. Arrange the data in ascending order from the smallest value to the largest.
- 2. Compute the index i:
i is the position number of the percentile you're interested in
- p is the percentile you're interested in knowing
- n is the number of items in the data set
- 3. If i is not an integer, round up to the nearest integer. The next integer value
- greater than i
- denotes the position of the pth percentile.
- If i is an integer, the pth percentile is the average of the data values
- in positions i and i + 1.
- The inter-quartile range is the 75th percentile minus the 25th percentile, or Q3 –
- Q1. This range has less dependency on outliers than does the range previously discussed.
variance of a data set
- is an important measure of dispersion
- within a data set because it takes into account all the data values
The variance of the population
- is the average of the squared deviations from the
- arithmetic mean. When you take the variance of a sample, you divide the squared deviations
- from the sample mean by the sample size minus 1. Doing this generally gives a better
- estimate of the population variance from which the sample comes.
denote the variance of a population with...
"little sigma squared."
We denote the sample variance
s 2 (squared)
The population mean and the population variance are called
- parameters of a
- population because they are quantities that are fixed for any given population.
call the sample mean and the sample variance
- sample statistics (or random
- variables) because they vary from one sample to another, inasmuch as their values depend
- on which sample is selected.
Use the following steps to calculate the sample variance:
- 1. Calculate the sample mean.
- 2. Calculate the difference between each observation and the sample mean.
- 3. Square each difference found in step 2.
- 4. Sum the squared differences found in step 3.
Divide the sum of the squared differences by the sample size minus one, n – 1.
Frequently, we use the ________ _________ instead of the variance to
- the standard deviation. You get the standard deviation by taking the square root of the
- variance. ( sample variance = s squared/population variance = "little sigma squared")
- The advantage of using the standard deviation is that it has the same units of
- measurement as the data values.
represent this statistic with s, meaning "the square root of the
variance, s2 (squared)."
This is the representation for standard deviation.
What does the standard deviation actually mean?
- The standard deviation shows how the
- data points are distributed or dispersed about the sample mean. When the things you are
- measuring are alike, such as test scores from the same class, the bigger the standard
- deviation, the more dispersion you have about the mean.
Coefficient of Variation
- When two (or more) distributions have the same mean, the one with the largest standard
- deviation has the most variation. But what about when distributions have different means?
- In that case, you can't compare just the standard deviations. Instead, you have to compare
- the coefficient of variation (CV) for each distribution as well. The distribution with the
- highest CV has the most dispersion.
- This rule applies to data that are approximately normally distributed, that is, a
- bell-shaped symmetrical distribution. About 68 percent of the data points will fall within
- one standard deviation of the mean, and about 95 percent of the data points will be within
- two standard deviations of the mean.
- For example, let's continue with our inquiry into salaries but with a different
- profession. Let's take a sample of the salaries of 150 production workers. Here's a
- distribution of salaries we might find.
- tells us
- the minimum proportion of data points that lie within any number of standard deviations
- from the mean, regardless of the shape of the distribution. Chebyshev's theorem states:
- At least of
- the measurements fall within k standard deviations from the mean.
- Note: k must be greater than 1.
- For example, if you want to find out the minimum percentage of the data values that are
- within 2 standard deviations from the mean, you'd calculate:
- That is, for any data set, at least 75 percent of the data values are within two
- standard deviations from the mean.
- If you calculate the minimum percentage of values are between the mean and three
- standard deviations from the mean, you'll get an answer of "at least 89
- Although Chebyshev's theorem provides us only with lower bounds for the percentage of
- data values that lie within k (where k >1) standard deviations from the
- mean, it doesn't provide us with exact percentages. The power of Chebyshev's theorem lies
- in the fact that it is true for any distribution, regardless of its shape.
The ____ _ ______ compares the standard deviation relative
to the mean of the distribution. For this reason, the CV is also known as the ______ ______ _____ (RSE).
- coefficient of variation;
- relative standard error
Here's how we calculate the CV...
- Think of the CV for any variable as the precision of the mean for that variable. Many
- federal agencies, such as the National Center for Health Statistics (NCHS), use the CV as
- a measure of the precision or reliability of estimates of health characteristics. The
- smaller the CV, the more reliable (precise) the estimate is. The larger the CV, the more
- unreliable it is.
Shapes of distributions
- 1. Symmetrical distributions- Has the same center value for the mean, median, and mode.
- (mirrored appearance)
2. Uniform of Rectangular Distribution- Every class has the same frequency.
3. Skewed Distribution- One "tail" is longer than the other.
- If the longer tail is on the left, we say that the distribution is skewed to the
- left, or negatively skewed. If the longer tail is to the right, we say the
- distribution is skewed to the right, or positively skewed.
- 4. Bimodal Distribution- A bimodal distribution refers to a histogram in which two classes with largest
- frequencies are separated by at least one class, and the top two frequencies of these
- classes may have different values.