Statistics

Home

Get App

Create

categorical data

values that fall into separate, non-overlapping categories such as marital status or hair color
quantitative data

values that have measurement units such as dollars, degrees, inches, etc
five-number summary

minimum value, Q1, the median, Q3, and maximum value
mean
- x-bar=(sum of values/number of values)
- not resistant to extreme values
median
- middle of data set when the data have been ordered
- resistant to outliers
- a more appropriate measure of center when outliers are present or distribution is skewed
standard deviation
- measure of spread (variation)
- not resistant to outliers
Interquartile Range (IQR)
- IQR=Q3-Q1
- gives the spread of the middle 50% of the data
- resistant to outliers
range
- maximum-minimum
- single number
- extremely sensitive to outlying values
z-scores

z=(data value-mean)/SD
normal models/empirical rule

68 - 95 - 99.7%
finding normal percentiles
- 1. identify the variables and state the problem in terms of the observed variables
- 2. standardize the values by converting to z-scores
finding percentile (calculator)
- with z-score: 2nd Distr --> normalcdf (lower bound, upper bound)
- without z-score: 2nd Distr --> normalcdf (lower bound, upper bound, mean, standard deviation)
explanatory variable

defines the groups to be compared with respect to values of the response variables
response variable

the variable you hope to predict or explain. the outcome
correlation coefficient (r)
- no units
- requires quantitative variables
- -1<r<1
- r=0 represents no correlation
- r only measures the strength of a linear relationship
- not resistant to outliers
least squares regression
- y=bo+b1x
- y=a+bx
- bo(a) is the y-intercept
- b1(b) is the slope in "y-units per x-unit"
residual
- observed value - predicted value (y-ycap)
- the sum of the residuals is always equal to 0
performing a simulation
- 1. identify the trial to be repeated
- 2. state how you will model the random occurrence of an outcome
- 3. explain how you will simulate the trial
- define the response variable
- 4. run several trials
- 5. summarize the results across all trials
- 6. describe what your simulation shows and draw your conclusions about the real world
population

entire group of individuals that we hope to learn about
sample

a smaller group of individuals selected from the population
parameter
- a number that characterizes some aspect of the population such as the mean or standard deviation of some variable of the population
- Greek letters
statistic
- values calculated for sample data
- used to estimate values in the population (parameters)
- standard letters
simple random sample

each possible sample of n individuals has an equal chance of selection
stratified random sample
- population is first broken up into homogeneous groups called strata
- strata have something in common that effects the response variable
cluster sampling

divides the population into heterogeneous groups called clusters and then takes an SRS of some of the clusters
bias in sampling methods
- undercoverage
- voluntary response bias
- convenience sample
- nonresponse
- response bias
observational study

researchers observe individuals and record variables of interest but do not impose a treatment
experiment
- researcher deliberately imposes a treatment
- must identify at least one response and explanatory variable
- used to determine a cause-and-effect relationship
block design

groups based on a certain characteristic that they share that may affect the results of the experiment
matched pairs design
- a form of block design
- one subject: receives both treatments
control

no treatment/traditional treatment/placebo
lurking variables

variables that we did not think to measure but which can affect the response variable
randomize

reduces bias by equalizing the effects of lurking variables
replicate
- should include many subject in a comparative experiment
- experiment should be designed so that other researchers can replicate our results
single blind

subjects do not know which treatment group they have been assigned OR those who evaluate the results of the experiment do not know how subjects have been allocated to treatment groups
double blind

NEITHER the subject nor the evaluators know how the subjects have been allocated to treatment groups
confounding

cannot separate the effect of a treatment (explanatory variable) from the effects of other influences (confounding variables) on the response variable
trial

single attempt or realization of a random phenomenon (rolling a pair of dice)
outcome

the value measured, observed, or reported for each trial (the faces shown on the dice)
sample space

the set of all possible outcomes
event
- collection of outcomes
- usually designated by capital letters
- ie: A=the probability of rolling a sum of seven when two dice are rolled
the law of large numbers

the long-run relative frequency of repeated independent events gets closer and closer to the true relative frequency as the number of trials increases
independence

if the occurrence of one event does not alter the probability that the other event occurs
compliment of event A

the set of all possible outcomes that are NOT in the event A
disjoint

cannot occur together

Author

Melina.gonzales

216009

Card Set

Statistics

Description

AP Statistics exam flash cards

Updated

4/27/2013, 3:42:54 AM

Show Answers