-
categorical data
values that fall into separate, non-overlapping categories such as marital status or hair color
-
quantitative data
values that have measurement units such as dollars, degrees, inches, etc
-
five-number summary
minimum value, Q1, the median, Q3, and maximum value
-
mean
- x-bar=(sum of values/number of values)
- not resistant to extreme values
-
median
- middle of data set when the data have been ordered
- resistant to outliers
- a more appropriate measure of center when outliers are present or distribution is skewed
-
standard deviation
- measure of spread (variation)
- not resistant to outliers
-
Interquartile Range (IQR)
- IQR=Q3-Q1
- gives the spread of the middle 50% of the data
- resistant to outliers
-
range
- maximum-minimum
- single number
- extremely sensitive to outlying values
-
z-scores
z=(data value-mean)/SD
-
normal models/empirical rule
68 - 95 - 99.7%
-
finding normal percentiles
- 1. identify the variables and state the problem in terms of the observed variables
- 2. standardize the values by converting to z-scores
-
finding percentile (calculator)
- with z-score: 2nd Distr --> normalcdf (lower bound, upper bound)
- without z-score: 2nd Distr --> normalcdf (lower bound, upper bound, mean, standard deviation)
-
explanatory variable
defines the groups to be compared with respect to values of the response variables
-
response variable
the variable you hope to predict or explain. the outcome
-
correlation coefficient (r)
- no units
- requires quantitative variables
- -1<r<1
- r=0 represents no correlation
- r only measures the strength of a linear relationship
- not resistant to outliers
-
least squares regression
- y=bo+b1x
- y=a+bx
- bo(a) is the y-intercept
- b1(b) is the slope in "y-units per x-unit"
-
residual
- observed value - predicted value (y-ycap)
- the sum of the residuals is always equal to 0
-
performing a simulation
- 1. identify the trial to be repeated
- 2. state how you will model the random occurrence of an outcome
- 3. explain how you will simulate the trial
- define the response variable
- 4. run several trials
- 5. summarize the results across all trials
- 6. describe what your simulation shows and draw your conclusions about the real world
-
population
entire group of individuals that we hope to learn about
-
sample
a smaller group of individuals selected from the population
-
parameter
- a number that characterizes some aspect of the population such as the mean or standard deviation of some variable of the population
- Greek letters
-
statistic
- values calculated for sample data
- used to estimate values in the population (parameters)
- standard letters
-
simple random sample
each possible sample of n individuals has an equal chance of selection
-
stratified random sample
- population is first broken up into homogeneous groups called strata
- strata have something in common that effects the response variable
-
cluster sampling
divides the population into heterogeneous groups called clusters and then takes an SRS of some of the clusters
-
bias in sampling methods
- undercoverage
- voluntary response bias
- convenience sample
- nonresponse
- response bias
-
observational study
researchers observe individuals and record variables of interest but do not impose a treatment
-
experiment
- researcher deliberately imposes a treatment
- must identify at least one response and explanatory variable
- used to determine a cause-and-effect relationship
-
block design
groups based on a certain characteristic that they share that may affect the results of the experiment
-
matched pairs design
- a form of block design
- one subject: receives both treatments
-
control
no treatment/traditional treatment/placebo
-
lurking variables
variables that we did not think to measure but which can affect the response variable
-
randomize
reduces bias by equalizing the effects of lurking variables
-
replicate
- should include many subject in a comparative experiment
- experiment should be designed so that other researchers can replicate our results
-
single blind
subjects do not know which treatment group they have been assigned OR those who evaluate the results of the experiment do not know how subjects have been allocated to treatment groups
-
double blind
NEITHER the subject nor the evaluators know how the subjects have been allocated to treatment groups
-
confounding
cannot separate the effect of a treatment (explanatory variable) from the effects of other influences (confounding variables) on the response variable
-
trial
single attempt or realization of a random phenomenon (rolling a pair of dice)
-
outcome
the value measured, observed, or reported for each trial (the faces shown on the dice)
-
sample space
the set of all possible outcomes
-
event
- collection of outcomes
- usually designated by capital letters
- ie: A=the probability of rolling a sum of seven when two dice are rolled
-
the law of large numbers
the long-run relative frequency of repeated independent events gets closer and closer to the true relative frequency as the number of trials increases
-
independence
if the occurrence of one event does not alter the probability that the other event occurs
-
compliment of event A
the set of all possible outcomes that are NOT in the event A
-
disjoint
cannot occur together
|
|