Stats
Card Set Information
Author:
Anonymous
ID:
6102
Filename:
Stats
Updated:
2010-02-05 02:21:20
Tags:
stats ch 1
Folders:
Description:
Stats Chapter 1
Show Answers:
individuals
Individuals can be people,
animals, plants, or any object of interest.
variable
any characteristic of an
individual. A variable varies
among individuals.
distribution
tells us what values the
variable takes and how often it takes these values.
quantitative variable
Something that takes
numerical values for which arithmetic operations, such as adding and averaging,
make sense.
categorical variable
Something that falls into
one of several categories. What can be counted is the count or proportion of
individuals in each category
Ways to chart categorical data
Bar graphs and pie charts
Bar graphs
Each category is
represented by one bar. The bar’s height shows the count (or sometimes the
percentage) for that particular category
Pie charts
Each slice represents a piece of one whole. The size of
a slice depends on what percent of the whole this category represents.
Ways to chart quantitative data
Histograms and stemplots, and Line graphs: time plots
Line graphs: time plots
Use when there is a
meaningful sequence, like time. The line connecting the points helps emphasize
any change over time
Histograms and stemplots
These are summary graphs
for a single variable. They are very useful to understand the pattern of
variability in the data
Histograms
- The range of values that a
variable can take is divided into equal size intervals.
- The histogram shows the
number of individual data points that fall in each interval.
stem plots
-To compare two related distributions, a back-to-back stem plot with common
stems is useful.
-Stem plots do not work well for large
datasets.
-When the observed values have too many
digits, trim the numbers before making
a stem plot.
-When plotting a moderate number of
observations, you can split each
stem.
Interpreting histograms
We can describe the overall pattern of a histogram by its shape, center, and spread.
A distribution is symmetric if ...
the right and left sides
of the histogram are approximately mirror images of each other.
A distribution is skewed to the right
if the right side of
the histogram (side with larger values) extends much farther out than the left
side
skewed to the left
if the left side of
the histogram extends much farther out than the right side
An important kind of
deviation is an outlier
Outliers are observations that lie outside the overall
pattern of a distribution
A trend is
a rise or fall that persists over time, despite small irregularities.
seasonal variation
A pattern that repeats itself at regular intervals of time
mean
add all values, then divide by the number of individuals. It is the “center of
mass.”
median
the midpoint of a
distribution—the number such that half of the observations are smaller and half are larger
Comparing the mean and the median
The
mean and the median are the same only if the distribution is symmetrical. The
median is a measure of center that is resistant to skew and outliers. The mean
is not
first quartile, Q1
the value
in the sample that has 25% of the data at or below it
third quartile, Q3
is the
value in the sample that has 75% of the data at or below it
“1.5 * IQR rule for outliers
if it falls more than 1.5
times the size of the interquartile range (IQR) above the first quartile or
below the third quartile
variance s2.
standard
deviation s.
-s measures spread about the mean and should be
used only when the mean is the measure of center.
-s = 0 only when all observations have the same
value and there is no spread.
Otherwise, s > 0.
-s is not resistant to outliers.
-s has the same units of measurement as the
original observations.
linear transformation
do not change the basic shape of a distribution (skew, symmetry,
multimodal). But they do change the measures of center and spread:
density curve
The
total area under the curve, by definition, is equal to 1, or 100%.
The area under the
curve for a range of values is the proportion of all observations for that
range
median of a density curve is
the equal-areas point
the
point that divides the area under the curve in half.
mean of a density curve is
the balance point
at which the curve would balance if it were made of solid material.
Normal – or Gaussian –
distributions
a family of symmetrical,
bell-shaped density curves defined by a mean m (mu) and a standard deviation
s (sigma)
: N(m,s).
z-score
measures the number of standard deviations that a data value x
is from the mean m.