The flashcards below were created by user
Melina.gonzales
on FreezingBlue Flashcards.

categorical data
values that fall into separate, nonoverlapping categories such as marital status or hair color

quantitative data
values that have measurement units such as dollars, degrees, inches, etc

fivenumber summary
minimum value, Q1, the median, Q3, and maximum value

mean
 xbar=(sum of values/number of values)
 not resistant to extreme values

median
 middle of data set when the data have been ordered
 resistant to outliers
 a more appropriate measure of center when outliers are present or distribution is skewed

standard deviation
 measure of spread (variation)
 not resistant to outliers

Interquartile Range (IQR)
 IQR=Q3Q1
 gives the spread of the middle 50% of the data
 resistant to outliers

range
 maximumminimum
 single number
 extremely sensitive to outlying values

zscores
z=(data valuemean)/SD

normal models/empirical rule
68  95  99.7%

finding normal percentiles
 1. identify the variables and state the problem in terms of the observed variables
 2. standardize the values by converting to zscores

finding percentile (calculator)
 with zscore: 2nd Distr > normalcdf (lower bound, upper bound)
 without zscore: 2nd Distr > normalcdf (lower bound, upper bound, mean, standard deviation)

explanatory variable
defines the groups to be compared with respect to values of the response variables

response variable
the variable you hope to predict or explain. the outcome

correlation coefficient (r)
 no units
 requires quantitative variables
 1<r<1
 r=0 represents no correlation
 r only measures the strength of a linear relationship
 not resistant to outliers

least squares regression
 y=bo+b1x
 y=a+bx
 bo(a) is the yintercept
 b1(b) is the slope in "yunits per xunit"

residual
 observed value  predicted value (yycap)
 the sum of the residuals is always equal to 0

performing a simulation
 1. identify the trial to be repeated
 2. state how you will model the random occurrence of an outcome
 3. explain how you will simulate the trial
 define the response variable
 4. run several trials
 5. summarize the results across all trials
 6. describe what your simulation shows and draw your conclusions about the real world

population
entire group of individuals that we hope to learn about

sample
a smaller group of individuals selected from the population

parameter
 a number that characterizes some aspect of the population such as the mean or standard deviation of some variable of the population
 Greek letters

statistic
 values calculated for sample data
 used to estimate values in the population (parameters)
 standard letters

simple random sample
each possible sample of n individuals has an equal chance of selection

stratified random sample
 population is first broken up into homogeneous groups called strata
 strata have something in common that effects the response variable

cluster sampling
divides the population into heterogeneous groups called clusters and then takes an SRS of some of the clusters

bias in sampling methods
 undercoverage
 voluntary response bias
 convenience sample
 nonresponse
 response bias

observational study
researchers observe individuals and record variables of interest but do not impose a treatment

experiment
 researcher deliberately imposes a treatment
 must identify at least one response and explanatory variable
 used to determine a causeandeffect relationship

block design
groups based on a certain characteristic that they share that may affect the results of the experiment

matched pairs design
 a form of block design
 one subject: receives both treatments

control
no treatment/traditional treatment/placebo

lurking variables
variables that we did not think to measure but which can affect the response variable

randomize
reduces bias by equalizing the effects of lurking variables

replicate
 should include many subject in a comparative experiment
 experiment should be designed so that other researchers can replicate our results

single blind
subjects do not know which treatment group they have been assigned OR those who evaluate the results of the experiment do not know how subjects have been allocated to treatment groups

double blind
NEITHER the subject nor the evaluators know how the subjects have been allocated to treatment groups

confounding
cannot separate the effect of a treatment (explanatory variable) from the effects of other influences (confounding variables) on the response variable

trial
single attempt or realization of a random phenomenon (rolling a pair of dice)

outcome
the value measured, observed, or reported for each trial (the faces shown on the dice)

sample space
the set of all possible outcomes

event
 collection of outcomes
 usually designated by capital letters
 ie: A=the probability of rolling a sum of seven when two dice are rolled

the law of large numbers
the longrun relative frequency of repeated independent events gets closer and closer to the true relative frequency as the number of trials increases

independence
if the occurrence of one event does not alter the probability that the other event occurs

compliment of event A
the set of all possible outcomes that are NOT in the event A

disjoint
cannot occur together

