Card Set Information
stats ch 1
Stats Chapter 1
Individuals can be people,
animals, plants, or any object of interest.
any characteristic of an
individual. A variable varies
tells us what values the
variable takes and how often it takes these values.
Something that takes
numerical values for which arithmetic operations, such as adding and averaging,
Something that falls into
one of several categories. What can be counted is the count or proportion of
individuals in each category
Ways to chart categorical data
Bar graphs and pie charts
Each category is
represented by one bar. The bar’s height shows the count (or sometimes the
percentage) for that particular category
Each slice represents a piece of one whole. The size of
a slice depends on what percent of the whole this category represents.
Ways to chart quantitative data
Histograms and stemplots, and Line graphs: time plots
Line graphs: time plots
Use when there is a
meaningful sequence, like time. The line connecting the points helps emphasize
any change over time
Histograms and stemplots
These are summary graphs
for a single variable. They are very useful to understand the pattern of
variability in the data
- The range of values that a
variable can take is divided into equal size intervals.
- The histogram shows the
number of individual data points that fall in each interval.
-To compare two related distributions, a back-to-back stem plot with common
stems is useful.
-Stem plots do not work well for large
-When the observed values have too many
digits, trim the numbers before making
a stem plot.
-When plotting a moderate number of
observations, you can split each
We can describe the overall pattern of a histogram by its shape, center, and spread.
A distribution is symmetric if ...
the right and left sides
of the histogram are approximately mirror images of each other.
A distribution is skewed to the right
if the right side of
the histogram (side with larger values) extends much farther out than the left
skewed to the left
if the left side of
the histogram extends much farther out than the right side
An important kind of
deviation is an outlier
Outliers are observations that lie outside the overall
pattern of a distribution
A trend is
a rise or fall that persists over time, despite small irregularities.
A pattern that repeats itself at regular intervals of time
add all values, then divide by the number of individuals. It is the “center of
the midpoint of a
distribution—the number such that half of the observations are smaller and half are larger
Comparing the mean and the median
mean and the median are the same only if the distribution is symmetrical. The
median is a measure of center that is resistant to skew and outliers. The mean
first quartile, Q1
in the sample that has 25% of the data at or below it
third quartile, Q3
value in the sample that has 75% of the data at or below it
“1.5 * IQR rule for outliers
if it falls more than 1.5
times the size of the interquartile range (IQR) above the first quartile or
below the third quartile
-s measures spread about the mean and should be
used only when the mean is the measure of center.
-s = 0 only when all observations have the same
value and there is no spread.
Otherwise, s > 0.
-s is not resistant to outliers.
-s has the same units of measurement as the
do not change the basic shape of a distribution (skew, symmetry,
multimodal). But they do change the measures of center and spread:
total area under the curve, by definition, is equal to 1, or 100%.
The area under the
curve for a range of values is the proportion of all observations for that
median of a density curve is
the equal-areas point
point that divides the area under the curve in half.
mean of a density curve is
the balance point
at which the curve would balance if it were made of solid material.
Normal – or Gaussian –
a family of symmetrical,
bell-shaped density curves defined by a mean m (mu) and a standard deviation
measures the number of standard deviations that a data value x
is from the mean m.