Elementary Statistics

1. Individuals
• objects described by a set of data
• people, animals, things
2. Variable
• any characteristic of an individual
• can take different values for different individuals
3. Categorical Variable
places an individual into one of several groups or categories
4. Quantitative Variable
• takes numerical values for which artihmetic operations like adding and averaging make sense
• usually recorded in a unit of measurement
5. Distribution of a Variable
what values it takes and how often it takes these values
6. Distribution of a categorical  Variable
lists categories and gives count or % of individuals who fall in each category
7. Roundoff error
don't point to mistakes but to effects of rounding off results
8. Pie Charts
• show distribution of a categorical variable as pie slices
• must include all the categories that make up the whole
• use pie chart only when you want to emphasize each categories relation to whole
9. Bar graphs
• represent each category as a bar
• bar height shows category counts or %'s
• can compare any set quantities measured in the same units
10. Histogram
Most common graph of the distribution of one quantitative variable
11. Shape of a distribuition
• symmetric-right and left side are roughly mirror images
• Skewed to the right-right side of histogram extends much farther than the left
• Skewed to the left- left side of histogram extends much farther than the right
12. Center of a distribuition
midpoint of distribuition
The range of a distribuition between the smallest value to the largest
14. Stemplot
• like a histogram turned on end using #'s
• cannot chose classes they are given to you
• preserves actual value of each observation
• do not work well with large data sets
15. stems and leaves
• Parts of a stemplot
• stems-all but the final (rightmost) digit
• leaf- the final digit
16. Split stems
• double the # of stems when all the leaves would fall on just a few stems
• each stem appears twice
• 0 to 4 go on upper stem, 5 to 9 go on lower stem
17. Timeplot
• plots each observation of a variable against the time at which it was measured
• Always put time on horizontal
• always put variable you are measuring on vertical
18. Cycles
regular up and down movements in a timeplot
19. Trend
long term up or down movement over time
20. Time series data
• change in one variable at a specific location over time
• timeplot
21. Cross sectional data
displays a variable at many locations at the same time
22. Exploratory data analysis
using graphs and and numerical summaries to describe the variables in a data set and the relations among them
23. mean is not a resistant measure of center
• mean is sensitive to a few extreme observations
• also sensitive to skewed distribution without outliers pulled towards tail
24. Median M
• midpoint of distribution
• such that half the observations are smaller and half larger
25. five number summary
• gives the smallest and largest as well the median and 1st and 3rd Q
• Minimum Q1 M Q3 Maximum
26. Boxplot
• central box spans Q1 andQ3
• line in box marks median M
• lines extend from box out to smallest and largest observations
• best used for side by side comparison of more than one distribution
27. IQR
• Interquartile range
• distance between Q1 and Q3
• IQR = Q3 - Q1
28. 1.5 x IQR rule for outliers
call an observation an outlier if it is more than 1.5 x IQR above Q3 or below Q1
