1. Statistics is the science of...
collecting, organizing, summarizing, analyzing and interpreting data in order to make decisions
2. biostatistics is statistics for...
biomedical applications
3. The two branches of statistics are...
descriptive statistics and inferential statistics
4. Descriptive statistics involves....
organization, summarization and presentation of data
5. Inferential statistics involves...
using a sample to draw conclusions about a population
6. Two kinds of variability are...
• explained (or attributable) variability
• unexplained variability ("noise")
7. two things that unexplained variability in data leads to is...
• uncertainty about conclusions drawn from data
• unpredictability of the next observation or measurment
8. approximation is when you use...
a simpler idea, object or representation to stand for the more complex one of interest
9. when you use approximation you gain...
convenience, feasibility, reduced cost or effort and clarity
10. when you use approximation you lose...
some characteristics or information in the original
11. summarization is one form of...
approximation
12. models are _______ to the real world
approximations
13. models are an...
abstract representation of some phenomenon or process
14. models always miss....
some properties of the orginal
15. statistical models explicitly recognize the presence of...
unattributable variability in data
16. A population is...
the collection of all responses, measurements, or counts that are of interest
17. A sample is a...
subset of a population
18. A census is...
a complete collection of the population
19. What are the four reasons to use a sample?
• the population is too large to obtain data
• saves time and money
• somteimes untis are destroyed in measurment
• all members of a population may be difficult to contact
20. a disadvantage to using a sample is...
having some error
21. A parameter is...
a numerical description of a population characteristic
22. A statistic is..
a numerical description of a sample characteristic
23. what are the 5 kinds of sampling techniques
• simple random sampling
• stratified sampling
• cluster sampling
• systematic sampling
• Convenience Sampling
24. every member of the population has an equal chance of being selected in what kind of sampling?
simple random sampling
25. What sampling is analogous to putting everyone's name in a hat and drawing out names at random?
simple random sampling
26. What is multi-stage random sampling?
When subgroups of the population are first randomly selected and then within each subgroup, simple random ampling is done.
27. When a population is divided into groups (strata) according to some characteristic it is called ________ sampling.
stratified
28. The sampling in which, a simple random sample is selected from each group and then combined to form a final sample is...
stratified sampling
29. What kind of sampling is useful when the popultaion falls into subgroups that each have similar characteristics?
cluster sampling
30. What is the sampling in which the final sample consists of all members of one of more of the groups?
cluster sampling
31. In what sampling is each member of the population assigned a number and then are ordered?
systematic sampling
32. In which sample, is the final member made up of every kth member?
systematic sampling
33. What is the easiest sampling with the worst technique?
convenience sampling
34. In which sampling is only readily available data used?
convenience sampling
35. Which sampling is often not representative of the population?
convenience sampling
36. Every measurment obtained in a study should always have good ______ and ______.
validity and reliability
37. Validity is related to concepts of...
accuracy and bias
38. Reliability is related to concepts of...
precision and variation
39. Three sources of measurement are...
• ratings
• reports
41. "Ratings" are when...
"Judges" are used to assess the condition of subjects using predefined criteria
42. "Reports" refer to...
when subjects provide their own recollections and reports of symptoms, conditions and performances.
43. Almost all worthwhile scientific studies involve comparison of one or more groups with respect to a....
response variable
44. response variables are usually determined at the end of...
the subject's participation in the study
45. response variable relate, in varying degrees, to ....
the primary and secondary questions
46. In a randomized clinical trial, subjects are randomized to......
treatment groups
47. In a randomized clinical trial, the groups must be similar except for.....
48. What kind of trial is the best scientific approach for a comparative study?
Randomized clinical trial
49. Why is the randomized clinical trial considered to be the best scientific approach for a comparative study?
Because the differences observed can be attributed to treatment.
50. Randomized clinical trial can sometimes be ______ or ______ impossible to conduct.
ethically or practically
51. In randomized clinical trials, there needs to be a link between ______ and _____.
efficacy and effectiveness
52. Efficacy is..
how well the therapy works under ideal conditions
53. Effectiveness is...
how well therapy works in a real-world setting
54. An uncontrolled trial generally provides a ________ view of therapy
distorted
55. Sometimes a "no-treatment" control group is not....
ethical
56. Historical controls deal with...
using a comparison group obtained from the medical records of similar subjects.
57. The advantages to using historical controls are..
• that its cheap and simple
• All subjects receive, which investigators believe is superior
58. The disadvantages to using historical controls are..
• the quality and availability of historical data
• Criteria of response may change
• Ancillary patient care improves
59. Using historical controls proves to be appropriate is the disease is...
unifromally fatal initially and a new drug becomes available
60. When using historical controls a decline in fatality would signify what?
that the treatment works
61. what are the four reasons that treatments and controls do differ?
• sampling variability or chance
• inherent differences between treatment and control subjects
• Differences in the handling and evaluation of the treatment and control groups during the course of the investigation
• True effect of the new procedure
62. A good experimental design will reduce, if not eliminate what two factors?
• inherent differences between treatment and control subjects
• Differences in the handling and evaluation of the treatment and control groups during the course of the investigation
63. Which design is the simplest, most used design?
Parallel group design
64. In which design are subjects randomized to groups and followed in a parallel fashion?
parallel group design
65. in parallel group design, each subject gets how many treatment assignments?
one
66. in which design do subjects act as their own control?
cross-over design
67. in which design are generally fewer subjects needed?
cross-over design
68. In cross-over design, subject differences do not interfere with...
treatment comparisons
69. in the cross-over design, subjects are randomized to order of.....
70. In which design, is there a "washout" period so that the first treatment that subjects recieve leaves the system?
cross-over design
71. What two problems are always a concern with the cross-over design?
• carry-over effects
• drop-outs
72. What kind of therapies are cross-over designs appropriate for?
therapies that may offer short-term relief of sign or symptoms and not a cure for a condition.
73. in the cross-over design, comparisons are made between groups based on ......
how we intended to treat the subject
74. Our comparsions are based on how we intended to treat the subject because...
Subjects may not always fully comply with the assigned treatment.
75. Three examples of Observational studies are...
• case-control study
• cohort study
• cross-sectional study
76. In which study are characteristics of a sample observed at one point in time?
cross-sectional study
77. What is the advantage to a cross-sectional study?
quick and cheap
78. what are the disadvantages to a cross-sectional study?
• often difficult to be sure that exposure precedes disease
• only measures prevalence
79. Which study is often called a "prospective study"?
cohort study
80. What is a cohort study?
when you have a group of subjects that are classified according to some characteristics that might be related to an outcome and then followed over time to observe the outcome.
81. What is the advantage to a cohort study?
good for rare exposures
82. what is the disadvantages of a cohort study?
• time consuming
• expensive
• good follow-up difficult
83. why might a good follow up in a cohort study be difficult to obtain?
subject dropouts
84. What is a case-control study characterized by?
the identification of the two study groups on the basis of the presence or absence of the outcome of interest, and by retrospective observation of antecedent factors under study
85. In a case-control study, the control must be representative of...
the population from which the cases came
86. A macthed case control study is when...
a control is matched to every case by certain factors, so that the two groups will be more similar
87. what are the advantages of case-control study?
• good for rare diseases
• fast
• inexpensive
• can simultaneously examine several antecedent factors of interest
88. what are the disadvantages of the case-control study?
• not good for rare exposures
• indirect way of assessing effects
89. cohort and case-control studies are considered to be longitudinal, meaning...
the subjects are studied at more than one time
90. What is confounding?
a mixing of the effect of a third factor into the exposure-response relationship
91. Confounding can be controlled in the analysis if...
information on the confounders is available
92. What is a bias?
a condtion, tendency or inclination that prevents a fair comparison of groups
93. A selection bias is...
the decision to admit a subject to the trial or which group to assign the subject is effected by knowledge of how the subject may respond to treatment.
94. In a volunteer bias, subjects who volunteer, or refuse new treatments may be...
very different
95. many studies show that _____ tend to be healthier than the general population and are more likely to comply with medical recommendations.
volunteers
96. Using a "placebo effect" creates a ______ bias.
response
97. what is a placebo effect?
knowledge that a patient is being treated effects the patient's response to treatment
98. if a placebo is being used, subjects should be ______ to the treatment if possible.
blinded
99. Assesment bias is when you have knowledge of...
the treatment received by the subject effects researcher's assessment
100. a kind of bias in which an observer may be unconsciously prejudiced is...
assessment bias
101. assement bias may be avoided by...
double blinding
102. Double blinding is...
when neither the subject nor the assessor know which treatment the subject is receiving
103. A data set is..
a set of values of one or more variables for a collection of inidividuals or units
104. a binary endpoint variable is...
classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. (if they are sick or not)
105. a nominal endpoint variable is...
• classification based on a categorical sense. Subjects are classified by different qualitative catergories.
• (if the values / observations belonging to it can be assigned a code in the form of a number where the numbers are simply labels.)
106. an ordinal endpoint variable is ...
classification based on an order of rank (1st, 2nd, 3rd, etc.)
107. a discrete endpoint variable is....
if the values / observations belonging to it are distinct and separate, i.e. they can be counted (1,2,3,....)
108. a continuous endpoint variable is...
if the values / observations belonging to it may take on any value within a finite or infinite interval. You can count, order and measure continuous data. For example height, weight, temperature)
109. You can always convert a variable from a more ______ scale to less, but not ________.
• informative
• vice versa
110. What is the tabular display?
Frequency distribution
111. What are the graphical displays?
Frequency histogram, dot plot, stem-and-leaf plot, pie chart, pareto chart, scatter plot, time series
112. Frequency distributions are _______ tables for a _____ variable
• one-way
• single
113. The relative frequency is...
Proportion of observations that take each value
114. a list of relative frequencies is often called the...
distribution of the variable
115. Frequency distribution shows how observed values are distributed across...
possible values.
116. For what kind of data is there sometimes too many values to list in a frequency distribution?
continuous and discrete
117. When using frequency distributions, it is good to use cumulative frequency and cumulative relative frequency when you have what kind of data?
ordinal or quantitative data
118. Cumulative frequency
running sum of frequencies
119. cumulative relative frequency
running sum of relative frequencies
120. the cumulative relative frequency gives proportion of observations that are...
less than or equal to the value
121. Cumulative realtive frequency is not sensible for what kind of data?
nominal
122. A histogram is a...
graphical display of the frequency or relative frequency
123. The horizontal scale of the histogram is
quantitative
124. the vertical scale of the histogram measures...
frequency or relative frequency

125. This chart is a...
histogram
126. The dot plot is a one-way...
scatter plot
127. a dot plot is an alternative to the...
histogram

128. This graph is called...
dot plot
129. A pie chart shows a relative frequency of...
qualitative data values
130. In a pie chart, the area of each catergory is proportional to ...
the corresponding relative frequency
131. Angle for data entry in a pie chart =
relative frequency x 360o
132. A parto chart is...
a chart that is similar to histogram but used for qualitative data
133. in a pareto chart, the vertical axis may represent...
frequency or relative frequency
134. in a pareto chart, the bars are ordered form...
largest to smallest frequency

135. This graph is a ...
Pareto chart
136. A graph for bivariate data is...
scatter plot
137. in a scatter plot, quantitaitve data is on which axis?
both of them
138. Each point of a scatter plot represents...
one observation
139. a scatter plot can show the relationship between...
variable

140. What kind of graph is this?
Scatter plot
141. A time series chart is for....
entries taken regulary over time
142. A time series chart is useful for...
identifying trends

143. This type of graph is...
a time series chart
144. Using a mode is not useful with ____ data
continuous
145. the mode is useful for _____ data
discrete (ex. Counts)
146. the mean is only appropriate for ______ data
quantitative
147. Outliers are values which are...
different from the bulk of the data
148. the mean is very ______, always pulled in the direction of outliers.
sensitive
149. in response to outliers, the ____ is less sensitve than the mean
median
150. Average deviation always equals...
zero
151. the advantages of sample standard deviation is...
• uses every observation
• mathematically manageable
152. disadvantage to using standard deviation
sensitive to extreme observations
153. The sample variance is...
the "average", or squared deviations
154. The variance/Standard deviation is always...
greater than or equal to 0
155. In terms of (variance/SD) a larger value means _______ variability.
more
156. for (variance/SD) = 0, there is no...
variability (all data have the same value)
157. comparing two or more variances is only meaningful when in ...
the same units
158. the coefficient of variation is ...
a unitless measure of relative variability
159. the coefficient of variation can be used to...
compare the relative variation between any sets of values
160. the advantages of the CV are..
dimesionless, independet of units used
161. the disadvantages of the CV are..
statiscally awkward, and can only be when the mean is greater than 0
162. p=
the percentage of data less than or equal to desired percentile, divided by 100 (25th percentile = 25/100)=p=.25
163. percentiles are also called...
quantiles
164. i=
# of observation (1st,2nd,etc.)
165. i formula=
p(N+1)
166. if i is an integer, then 100pth percentile =
xi
167. if i is not an integer, then 100pth percentile =
xk + (i - k) (xk+1 - xk)
168. k=?
the interger part of i
169. Quartiles are...
percentiles which divide the distribution into four equal parts
170. lower quartile = Q1= ___ percentile
25th
171. middle quartile = Q2= ____percentile
50th
172. upper quartile = Q3= ____percentile
75th
173. Range =
Largest observation - smallest observation (max-min)
174. the advantage of range is that..
it is easily determined
175. the disadvantages of Range are..
• only based on two values
• dpends on the number of observations
• usually too sensitive to extreme observations to be useful
176. Range never ______.
decreases
177. IQR (interquartile range) is based on...
the middle 50% of the data
178. IQR = formula?
IQR=Q3-Q1
179. the advantages to IQR are....
• not sensitive to extreme values
• independent or n
180. the disadvantages of IQR are...
• cannot be determined for small n
• does not directly use majority of data
181. a boxplot is...
a graphical summary of the distribution of data for a single variable
182. a boxplot can also be called a...
box and whiskers plot
183. the three parts to the boxplot are...
• "box" covers interval from Q1 to Q3
• whiskers extend from box to furthest observation within 1.5 x IQR from box
• Observations beyond whiskers shown individually
184. a boxplot ______ data near the center of the distribution and ____ individual observations far from center
• summarizes
• shows
185. number of peaks in data is the...
modes
186. one peak in data =
unimodal
187. two peaks in data =
bimodal
188. symmetric data=
data values are mirrored about a central value, one side is the mirror-image of the other
189. skewed data=
data values are more spread out in one direction than another
190. right skew means it ____ tailed and _____ skewed.
• right
• positively
191. in the right skew..
mean ___ median
mean > median
192. left skew means it ____ tailed and _____ skewed.
• left
• negatively
193. in the left skew..mean ___ median
mean < median
194. skewness pulls the mean in the direction of....
the tail
195. a symmetric and unimodal graph means...
mean ___ median
mean = median
196. a reason a bimodal graph might occur is because...
may be due to random chance or sample is made up of two distinct subgroups
197. multiplying every observation by positive constant C does what three things?
• multiplies mean, median by C
• multiples SD, Range, IQR by C
• multiplies variance by C2
198. adding a constant C to every observation does what two things...
• Adds C to mean, median
• Does not change measures of spread (Range, IQR, Variance, SD)
199. linear transformations (adding or multiplying constant) changes what?
changes location or location and spread but not the shape of distribution
200. non-linear transformations change the ___ of a distribution
shape
201. non-linear transformations are useful when...
modeling data

