The flashcards below were created by user
ametzga
on FreezingBlue Flashcards.

Individuals
 objects described by a set of data
 people, animals, things

Variable
 any characteristic of an individual
 can take different values for different individuals

Categorical Variable
places an individual into one of several groups or categories

Quantitative Variable
 takes numerical values for which artihmetic operations like adding and averaging make sense
 usually recorded in a unit of measurement

Distribution of a Variable
what values it takes and how often it takes these values

Distribution of a categorical Variable
lists categories and gives count or % of individuals who fall in each category

Roundoff error
don't point to mistakes but to effects of rounding off results

Pie Charts
 show distribution of a categorical variable as pie slices
 must include all the categories that make up the whole
 use pie chart only when you want to emphasize each categories relation to whole

Bar graphs
 represent each category as a bar
 bar height shows category counts or %'s
 can compare any set quantities measured in the same units

Histogram
Most common graph of the distribution of one quantitative variable

Shape of a distribuition
 symmetricright and left side are roughly mirror images
 Skewed to the rightright side of histogram extends much farther than the left
 Skewed to the left left side of histogram extends much farther than the right

Center of a distribuition
midpoint of distribuition

Spread of a distribuition
The range of a distribuition between the smallest value to the largest

Stemplot
 like a histogram turned on end using #'s
 cannot chose classes they are given to you
 preserves actual value of each observation
 do not work well with large data sets

stems and leaves
 Parts of a stemplot
 stemsall but the final (rightmost) digit
 leaf the final digit

Split stems
 double the # of stems when all the leaves would fall on just a few stems
 each stem appears twice
 0 to 4 go on upper stem, 5 to 9 go on lower stem

Timeplot
 plots each observation of a variable against the time at which it was measured
 Always put time on horizontal
 always put variable you are measuring on vertical

Cycles
regular up and down movements in a timeplot

Trend
long term up or down movement over time

Time series data
 change in one variable at a specific location over time
 timeplot

Cross sectional data
displays a variable at many locations at the same time

Exploratory data analysis
using graphs and and numerical summaries to describe the variables in a data set and the relations among them

mean is not a resistant measure of center
 mean is sensitive to a few extreme observations
 also sensitive to skewed distribution without outliers pulled towards tail

Median M
 midpoint of distribution
 such that half the observations are smaller and half larger

five number summary
 gives the smallest and largest as well the median and 1st and 3rd Q
 Minimum Q1 M Q3 Maximum

Boxplot
 central box spans Q1 andQ3
 line in box marks median M
 lines extend from box out to smallest and largest observations
 best used for side by side comparison of more than one distribution

IQR
 Interquartile range
 distance between Q1 and Q3
 IQR = Q3  Q1

1.5 x IQR rule for outliers
call an observation an outlier if it is more than 1.5 x IQR above Q3 or below Q1

