Statistics is the science of...
collecting, organizing, summarizing, analyzing and interpreting data in order to make decisions

biostatistics is statistics for...
biomedical applications

The two branches of statistics are...
descriptive statistics and inferential statistics

Descriptive statistics involves....
organization, summarization and presentation of data

Inferential statistics involves...
using a sample to draw conclusions about a population

Two kinds of variability are...
 explained (or attributable) variability
 unexplained variability ("noise")

two things that unexplained variability in data leads to is...
 uncertainty about conclusions drawn from data
 unpredictability of the next observation or measurment

approximation is when you use...
a simpler idea, object or representation to stand for the more complex one of interest

when you use approximation you gain...
convenience, feasibility, reduced cost or effort and clarity

when you use approximation you lose...
some characteristics or information in the original

summarization is one form of...
approximation

models are _______ to the real world
approximations

models are an...
abstract representation of some phenomenon or process

models always miss....
some properties of the orginal

statistical models explicitly recognize the presence of...
unattributable variability in data

A population is...
the collection of all responses, measurements, or counts that are of interest

A sample is a...
subset of a population

A census is...
a complete collection of the population

What are the four reasons to use a sample?
 the population is too large to obtain data
 saves time and money
 somteimes untis are destroyed in measurment
 all members of a population may be difficult to contact

a disadvantage to using a sample is...
having some error

A parameter is...
a numerical description of a population characteristic

A statistic is..
a numerical description of a sample characteristic

what are the 5 kinds of sampling techniques
 simple random sampling
 stratified sampling
 cluster sampling
 systematic sampling
 Convenience Sampling

every member of the population has an equal chance of being selected in what kind of sampling?
simple random sampling

What sampling is analogous to putting everyone's name in a hat and drawing out names at random?
simple random sampling

What is multistage random sampling?
When subgroups of the population are first randomly selected and then within each subgroup, simple random ampling is done.

When a population is divided into groups (strata) according to some characteristic it is called ________ sampling.
stratified

The sampling in which, a simple random sample is selected from each group and then combined to form a final sample is...
stratified sampling

What kind of sampling is useful when the popultaion falls into subgroups that each have similar characteristics?
cluster sampling

What is the sampling in which the final sample consists of all members of one of more of the groups?
cluster sampling

In what sampling is each member of the population assigned a number and then are ordered?
systematic sampling

In which sample, is the final member made up of every k^{th} member?
systematic sampling

What is the easiest sampling with the worst technique?
convenience sampling

In which sampling is only readily available data used?
convenience sampling

Which sampling is often not representative of the population?
convenience sampling

Every measurment obtained in a study should always have good ______ and ______.
validity and reliability

Validity is related to concepts of...
accuracy and bias

Reliability is related to concepts of...
precision and variation

Three sources of measurement are...

Readings come from...
readings on lab equipment

"Ratings" are when...
"Judges" are used to assess the condition of subjects using predefined criteria

"Reports" refer to...
when subjects provide their own recollections and reports of symptoms, conditions and performances.

Almost all worthwhile scientific studies involve comparison of one or more groups with respect to a....
response variable

response variables are usually determined at the end of...
the subject's participation in the study

response variable relate, in varying degrees, to ....
the primary and secondary questions

In a randomized clinical trial, subjects are randomized to......
treatment groups

In a randomized clinical trial, the groups must be similar except for.....
the treatment being received.

What kind of trial is the best scientific approach for a comparative study?
Randomized clinical trial

Why is the randomized clinical trial considered to be the best scientific approach for a comparative study?
Because the differences observed can be attributed to treatment.

Randomized clinical trial can sometimes be ______ or ______ impossible to conduct.
ethically or practically

In randomized clinical trials, there needs to be a link between ______ and _____.
efficacy and effectiveness

Efficacy is..
how well the therapy works under ideal conditions

Effectiveness is...
how well therapy works in a realworld setting

An uncontrolled trial generally provides a ________ view of therapy
distorted

Sometimes a "notreatment" control group is not....
ethical

Historical controls deal with...
using a comparison group obtained from the medical records of similar subjects.

The advantages to using historical controls are..
 that its cheap and simple
 All subjects receive, which investigators believe is superior

The disadvantages to using historical controls are..
 the quality and availability of historical data
 Criteria of response may change
 Ancillary patient care improves

Using historical controls proves to be appropriate is the disease is...
unifromally fatal initially and a new drug becomes available

When using historical controls a decline in fatality would signify what?
that the treatment works

what are the four reasons that treatments and controls do differ?
 sampling variability or chance
 inherent differences between treatment and control subjects
 Differences in the handling and evaluation of the treatment and control groups during the course of the investigation
 True effect of the new procedure

A good experimental design will reduce, if not eliminate what two factors?
 inherent differences between treatment and control subjects
 Differences in the handling and evaluation of the treatment and control groups during the course of the investigation

Which design is the simplest, most used design?
Parallel group design

In which design are subjects randomized to groups and followed in a parallel fashion?
parallel group design

in parallel group design, each subject gets how many treatment assignments?
one

in which design do subjects act as their own control?
crossover design

in which design are generally fewer subjects needed?
crossover design

In crossover design, subject differences do not interfere with...
treatment comparisons

in the crossover design, subjects are randomized to order of.....
administration

In which design, is there a "washout" period so that the first treatment that subjects recieve leaves the system?
crossover design

What two problems are always a concern with the crossover design?
 carryover effects
 dropouts

What kind of therapies are crossover designs appropriate for?
therapies that may offer shortterm relief of sign or symptoms and not a cure for a condition.

in the crossover design, comparisons are made between groups based on ......
how we intended to treat the subject

Our comparsions are based on how we intended to treat the subject because...
Subjects may not always fully comply with the assigned treatment.

Three examples of Observational studies are...
 casecontrol study
 cohort study
 crosssectional study

In which study are characteristics of a sample observed at one point in time?
crosssectional study

What is the advantage to a crosssectional study?
quick and cheap

what are the disadvantages to a crosssectional study?
 often difficult to be sure that exposure precedes disease
 only measures prevalence

Which study is often called a "prospective study"?
cohort study

What is a cohort study?
when you have a group of subjects that are classified according to some characteristics that might be related to an outcome and then followed over time to observe the outcome.

What is the advantage to a cohort study?
good for rare exposures

what is the disadvantages of a cohort study?
 time consuming
 expensive
 good followup difficult

why might a good follow up in a cohort study be difficult to obtain?
subject dropouts

What is a casecontrol study characterized by?
the identification of the two study groups on the basis of the presence or absence of the outcome of interest, and by retrospective observation of antecedent factors under study

In a casecontrol study, the control must be representative of...
the population from which the cases came

A macthed case control study is when...
a control is matched to every case by certain factors, so that the two groups will be more similar

what are the advantages of casecontrol study?
 good for rare diseases
 fast
 inexpensive
 can simultaneously examine several antecedent factors of interest

what are the disadvantages of the casecontrol study?
 not good for rare exposures
 indirect way of assessing effects

cohort and casecontrol studies are considered to be longitudinal, meaning...
the subjects are studied at more than one time

What is confounding?
a mixing of the effect of a third factor into the exposureresponse relationship

Confounding can be controlled in the analysis if...
information on the confounders is available

What is a bias?
a condtion, tendency or inclination that prevents a fair comparison of groups

A selection bias is...
the decision to admit a subject to the trial or which group to assign the subject is effected by knowledge of how the subject may respond to treatment.

In a volunteer bias, subjects who volunteer, or refuse new treatments may be...
very different

many studies show that _____ tend to be healthier than the general population and are more likely to comply with medical recommendations.
volunteers

Using a "placebo effect" creates a ______ bias.
response

what is a placebo effect?
knowledge that a patient is being treated effects the patient's response to treatment

if a placebo is being used, subjects should be ______ to the treatment if possible.
blinded

Assesment bias is when you have knowledge of...
the treatment received by the subject effects researcher's assessment

a kind of bias in which an observer may be unconsciously prejudiced is...
assessment bias

assement bias may be avoided by...
double blinding

Double blinding is...
when neither the subject nor the assessor know which treatment the subject is receiving

A data set is..
a set of values of one or more variables for a collection of inidividuals or units

a binary endpoint variable is...
classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. (if they are sick or not)

a nominal endpoint variable is...
 classification based on a categorical sense. Subjects are classified by different qualitative catergories.
 (if the values / observations belonging to it can be assigned a code in the form of a number where the numbers are simply labels.)

an ordinal endpoint variable is ...
classification based on an order of rank (1st, 2nd, 3rd, etc.)

a discrete endpoint variable is....
if the values / observations belonging to it are distinct and separate, i.e. they can be counted (1,2,3,....)

a continuous endpoint variable is...
if the values / observations belonging to it may take on any value within a finite or infinite interval. You can count, order and measure continuous data. For example height, weight, temperature)

You can always convert a variable from a more informative scale to less, but not vice versa.

What is the tabular display?
Frequency distribution

What are the graphical displays?
Frequency histogram, dot plot, stemandleaf plot, pie chart, pareto chart, scatter plot, time series

Frequency distributions are summary tables for a categorical variable

The relative frequency is...
Proportion of observations that take each value

a list of relative frequencies is often called the...
distribution of the variable

Frequency distribution shows how observed values are distributed across...
possible values.

For what kind of data is there sometimes too many values to list in a frequency distribution?
continuous and discrete

When using frequency distributions, it is good to use cumulative frequency and cumulative relative frequency when you have what kind of data?
ordinal or quantitative data

Cumulative frequency
running sum of frequencies

cumulative relative frequency
running sum of relative frequencies

the cumulative relative frequency gives proportion of observations that are...
less than or equal to the value

Cumulative realtive frequency is not sensible for what kind of data?
nominal

A histogram is a...
graphical display of the frequency or relative frequency

The horizontal scale of the histogram is
quantitative

the vertical scale of the histogram measures...
frequency or relative frequency

This chart is a...
histogram

The dot plot is a oneway...
scatter plot

a dot plot is an alternative to the...
histogram

This graph is called...
dot plot

A pie chart shows a relative frequency of...
qualitative data values

In a pie chart, the area of each catergory is proportional to ...
the corresponding relative frequency

Angle for data entry in a pie chart =
relative frequency x 360o^{}

A parto chart is...
a chart that is similar to histogram but used for qualitative data

in a pareto chart, the vertical axis may represent...
frequency or relative frequency

in a pareto chart, the bars are ordered form...
largest to smallest frequency

This graph is a ...
Pareto chart

A graph for bivariate data is...
scatter plot

in a scatter plot, quantitaitve data is on which axis?
both of them

Each point of a scatter plot represents...
one observation

a scatter plot can show the relationship between...
variable

What kind of graph is this?
Scatter plot

A time series chart is for....
entries taken regulary over time

A time series chart is useful for...
identifying trends

This type of graph is...
a time series chart

Using a mode is not useful with continuous data
continuous

the mode is useful for _____ data
discrete (ex. Counts)

the mean is only appropriate for ______ data
quantitative

Outliers are values which are...
different from the bulk of the data

the mean is very ______, always pulled in the direction of outliers.
sensitive

in response to outliers, the ____ is less sensitve than the mean
median

Average deviation always equals...
zero

the advantages of sample standard deviation is...
 uses every observation
 mathematically manageable

disadvantage to using standard deviation
sensitive to extreme observations

The sample variance is...
the "average", or squared deviations

The variance/Standard deviation is always...
greater than or equal to 0

In terms of (variance/SD) a larger value means _______ variability.
more

for (variance/SD) = 0, there is no...
variability (all data have the same value)

comparing two or more variances is only meaningful when in ...
the same units

the coefficient of variation is ...
a unitless measure of relative variability

the coefficient of variation can be used to...
compare the relative variation between any sets of values

the advantages of the CV are..
dimesionless, independet of units used

the disadvantages of the CV are..
statiscally awkward, and can only be when the mean is greater than 0

p=
the percentage of data less than or equal to desired percentile, divided by 100 (25th percentile = 25/100)=p=.25

percentiles are also called...
quantiles

i=
# of observation (1st,2nd,etc.)


if i is an integer, then 100p^{th} percentile =
x_{i}

if i is not an integer, then 100pth percentile =
x_{k} + (i  k) (x_{k+1}  x_{k})

k=?
the interger part of i

Quartiles are...
percentiles which divide the distribution into four equal parts

lower quartile = Q1= ___ percentile
25th

middle quartile = Q2= ____percentile
50th

upper quartile = Q3= ____percentile
75th

Range =
Largest observation  smallest observation (maxmin)

the advantage of range is that..
it is easily determined

the disadvantages of Range are..
 only based on two values
 dpends on the number of observations
 usually too sensitive to extreme observations to be useful

Range never ______.
decreases

IQR (interquartile range) is based on...
the middle 50% of the data


the advantages to IQR are....
 not sensitive to extreme values
 independent or n

the disadvantages of IQR are...
 cannot be determined for small n
 does not directly use majority of data

a boxplot is...
a graphical summary of the distribution of data for a single variable

a boxplot can also be called a...
box and whiskers plot

the three parts to the boxplot are...
 "box" covers interval from Q1 to Q3
 whiskers extend from box to furthest observation within 1.5 x IQR from box
 Observations beyond whiskers shown individually

a boxplot summarizes data near the center of the distribution and shows individual observations far from center

number of peaks in data is the...
modes

one peak in data =
unimodal

two peaks in data =
bimodal

symmetric data=
data values are mirrored about a central value, one side is the mirrorimage of the other

skewed data=
data values are more spread out in one direction than another

right skew means it is right tailed and positively skewed.

in the right skew..
mean > median
mean > median

left skew means it is left tailed and negatively skewed.

in the left skew..mean ___ median
mean < median

skewness pulls the mean in the direction of....
the tail

a symmetric and unimodal graph means...
mean = median
mean = median

a reason a bimodal graph might occur is because...
may be due to random chance or sample is made up of two distinct subgroups

multiplying every observation by positive constant C does what three things?
 multiplies mean, median by C
 multiples SD, Range, IQR by C
 multiplies variance by C^{2}

adding a constant C to every observation does what two things...
 Adds C to mean, median
 Does not change measures of spread (Range, IQR, Variance, SD)

linear transformations (adding or multiplying constant) changes what?
changes location or location and spread but not the shape of distribution

nonlinear transformations change the ___ of a distribution
shape

nonlinear transformations are useful when...
modeling data

