Math 1040 Chapter 1

Card Set Information

Math 1040 Chapter 1
2015-02-16 18:00:14
Maw Alia maw alia slcc math 1040 statistics

This is a compiled list of key terms from chapter 1 of SLCC course Math 1040 with Prof. Alia Maw
Show Answers:

  1. Experiment
    a controlled study (different from an observational study) conducted to determine the effect that varying one or more explanatory variables has on a response variable. Any combination of the values of the variables (a.k.a. "factors") is a treatment.
  2. Factor
    same thing as a variable. When you have only one explanatory variable, the factors and the treatments are one and the same.
  3. treatment
    combination of the values of the factors in an experiment. When you have only one explanatory variable, the factors and the treatments are one and the same.
  4. experimental unit
    (a.k.a. subject) is a person, object or some other well-defined item upon which a treatment is applied.
  5. control group
    serves as the baseline treatment that can be used to compare other treatments. Receives the placebo.
  6. Placebo
    is an innocuous medication that looks, tastes and smells like the experimental medication (e.g., a sugar tablet)
  7. Blinding
    refers to nondisclosure of the treatment an experimental unit is receiving (not telling the group(s) involved in an experiment which treatment it/they are receiving during the study.
  8. Single-blind experiment
    is one in which the experimental unit (or subject) does not know which treatment he/she is receiving (but the experimenter does).
  9. Double-blind experiment
    is one in which neither the experimental unit (subject) nor the researcher in contact with the experimental unit know which treatment the subject is receiving. A third party typically is involved to monitor administration of treatments and ensure that subjects and researchers involved with them remain blinded throughout the experiment.
  10. what does it mean for the experiment to be placebo controlled?
    The placebo control group serves as a baseline against which to compare the results from the group receiving the true treatment. The placebo is also used because people tend to behave differently when they are in a study. By having a placebo control group, this effect is neutralized.
  11. What does it mean for the experiment to be double-blind?
    Neither participants nor researchers know who in the study is receiving which type of treatment (actual treatment or placebo). This helps ensure that subjects receiving the placebo do not behave differently than those receiving the treatment and researchers do not treat either group differently than the other.
  12. Name the six steps used in Conducting a Designed Experiment:
    (1) identify the problem to be solved - a statement; it should be as explicit as possible and provide the experimenter with direction. Must identify the response variable and the population to be studied (a.k.a. the "claim")

    (2) Determine the factors that affect the response variable - usually identified by an expert; ask, "What things affect the value of the response variable?" Once identified, decide which factors to fix at a predetermined level, which to manipulate, and which to leave alone.

    (3) How many subjects will we include in our sample? - general rule: choose as many subjects as time and money allow.

    • (4) Determine the level of each factor (explanatory variable of interest) - factors can be controlled or randomized. When controlled, one of two things occurs:
    • (a) set the level of a factor at one value throughout the experiment if there is no interest in its effect on the response variable.
    • (b) set the level of a factor at various levels if interest is had in its effect on the response variable. the combinations of the levels of varied factors make up the experimental treatments.
    • When randomized, the subjects are randomly assigned to various treatment groups. In this way, the effect of factors whose levels cannot be controlled is minimized (randomization averages out - or mutes - the effects of uncontrolled factors).

    (5) Conduct the experiment - randomly assign the subjects to the treatments and collect and process the data. replication occurs when each treatment is applied to more than one subject (repeat the study over and over again to make sure results are valid and reliable).

    (6) Test the claim - this is inferential statistics - make generalizations about the population based on the sample's results and provide a statement as to your level of confidence in the results.
  13. What does it mean for the experiment to be double blind?
    the subjects are randomly assigned to take either the treatment or the placebo
  14. Define blocking
    Grouping together similar subjects and then randomly assigning subjects within each group to a treatment.
  15. Replication in an experiment is...
    using treatment on many experimental units.
  16. Statistics is...
    the science of collecting, organizing, summarizing, and analyzing information to answer questions and draw conclusions under situations of uncertainty.
  17. Data is...
    Information used to draw a conclusion or make a decision. Data do vary.
  18. Goal of statistics is to...
    describe and understand sources of variability
  19. population
    entire group to be studied (parameter is to population as statistic is to sample)
  20. individual
    a member of the population
  21. A statistic is...
    a numerical summary of a sample (statistic is to sample as parameter is to population)
  22. Descriptive statistics is...
    organizing and summarizing data. It describes results of a sample without making any general conclusions about the population.
  23. Inferential Statistics do what?
    Extend the results of a sample to the population it represents. Inferential stats also measure the reliability of a result.
  24. parameter
    numerical summary of a population (not of a sample) (e.g., some percentage of a population owns a car)
  25. Qualitative
    is categorical; it allows for classification based on an attribute or characteristic
  26. Quantitative
    numerical measures of individuals (e.g., how many miles were traveled, how long the table is in inches, etc.) Addition or subtraction can be performed with quantitative variables.
  27. If variables did not vary,...
    then they would be constants and inferential statistics would not be needed.
  28. 4 Steps of the Process of Statistics
    • (1) identify research objective
    • (2) collect information needed to answer the question posed.
    • (3) describe the data
    • (4) perform inference

    The goal of research is to learn the causes of variability.
  29. Discrete variable is...
    a quantitative variable that has a finite number of possible values or a countable number of possible values (hint: if you count to get the variable, it is discrete)
  30. Continuous variable is...
    quantitative variable that has an infinite number of possible values that are not countable. (hint: if you measure to get the variable, it is continuous)
  31. What comprises each of the following three: observational studies, designed experiments, and sampling methods
    (1) Observational Studies (end with 'studies')

    Case-Control Studies - these are retrospective. Reserachers look back in time at existing records. In these studies, individuals with certain characteristics are matched with those who do not have these characteristics.

    pros: better than cross-sectional

    cons: memory of participants may not be accurate and records might be lost.

    • Cross-Sectional Studies - studies that collect
    • information about individuals at a specific point in time or over a very short period.

    Cohort Studies - called prospective studies because they are done going forward. A group of individuals (cohort) is identified to participate and is observed over time (characteristics are recorded over time by the researcher)

    pros: no need to rely on cohort to report info to researchers

    cons: time and labor intensive

    *Cross-sectional and case control are relatively inexpensive and allow researchers to explore possible associations before undertaking large cohort studies or designed experiments.

    Remember there is no point in spending energy obtaining data that already exists.

    (2) Designed Experiments (end with 'design')

    Completely Randomized Design - a study wherein each subject is randomly assigned to a treatment.

    Matched-Pairs Design - (a.k.a. before-after or pretest-posttest experiments)

    This is an experimental design in which the subjects are paired up. Pairs are selected so that they are related in some way (twins, siblings, husband and wife, same geographical location, etc). There are only two levels of treatment in a matched-pairs design. One individual receives treatment, and the other receives the other treatment. Treatment is assigned by randomization. Often, the response variable is measured on each subject before and after treatment has been applied matching the individual against itself. In doing this, the individual is matched against him/herself.

    (3) Sampling methods (end with 'sampling')

    Simple Random Sampling - a sample of size "n" from a population size "N". it is obtained if every possible sample of size "n" has an equal chance of occurring. sample size is always less than the size of the population.

    Stratified Sampling - by dividing the population into non-overlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals in each strata should be similar in some way (homogenous).

    pros: may allow fewer to be surveyed while obtaining the same or more information

    Systematic SamplingNo frame is required

    obtained by selecting every "kth" individual from the population. The first individual selected corresponds to a number between 1 and "k" (e.g., every 10th individual). if the size of the population is unknown, there is no way to mathematically find "k"

    • pros: less chance of interviewer error, easier to employ, typically provides more info for a given cost than does Simple Random sampling (SRS)
    • Cluster Sampling - obtained by selecting all individuals within a randomly selected collection or group of individuals.

    • pros: reduce travel time likely required in stratified or SRS, no need to obtain a frame
    • Convenience Sampling - (a.k.a. voluntary response samples) the individuals are easily obtained and not based on randomness. most popular convenience samples are those in which the individuals are self-selected
  32. Outline the tree that organizes the different kinds of variables:
    Variables -->

    Qualitative, Quantitative, or Level of Measurement -->

    • Qualitative variables are just that, qualitative
    • Quantitative variables can be discrete or continuous
    • Level of measurement variables can be at the nominal, ordinal, interval, or ratio level
  33. Data
    the list of observed values for a variable
  34. qualitative data are...
    observations corresponding to a qualitative variable (fits into a category)
  35. quantitative data are...
    observations corresponding to a quantitative variable (numerical measure)
  36. discrete variables can be ________ while continuous variables are __________.
    counted; measured
  37. Levels of measurement
    • A variable is at the...
    • nominal level of measurement if the values of the variable name, label, or categorize and cannot be ranked in any order.
    • ordinal level of measurement if it has the properties of the nominal level of measurement and can be ranked
    • interval level of measurement if it has the properties of the ordinal level and the differences in the values of the variable have meaning. Arithmetic operations can be performed on values of the variable.
    • ratio level of measurement if it has the properties of the interval level and the ratios of the values of the variable have meaning. Value of "zero" means absence of quantity. Multiplication and division can be performed.
  38. Name and describe two methods for collecting data
    • Observational studies: measure the value of the response without attempting to influence the value of either the response or explanatory variables. (participants can choose to do what they want to do)
    • Designed Experiment: if a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, and then records the value of the response variable for each group. (groups of participants are influenced by the explanatory variable without knowing it)
  39. Confounding occurs when...
    the effects of two or more explanatory variables are not separated so the relation between explanatory and response variable may be due to some other variable(s) not accounted for in the study (the effect of two factors - explanatory variables on the response variable - cannot be distinguished).

    A confounding variable is an explanatory variable that was considered in the study whose effect cannot be distinguished from a second explanatory variable in the study. A confounding variable is not necessarily associated with other explanatory variables, but it has an effect on the response variable.
  40. Lurking variable is...
    an explanatory variable that was not considered in a study but that affects the value of the response variable (related to explanatory and response variables).
  41. In observational studies, what can we say and not say?
    that changes in the explanatory variable cause the changes in the response variable. We can say that changes may be associated.
  42. What are designed experiments required to do?
    make statements of causality
  43. What is a frame?
    a list of all the individuals within the population
  44. why do inference based on samples vary?
    because the individuals in the sample vary. the individuals differ from sample to sample because chance is used to select the individuals.
  45. Goal of effective sampling methods: obtain ___ ____ ___________ as possible about the __________ at the _____ ____.
    as much information, population, least cost
  46. what can result from choosing a value of "k" that is too small?
    the desired sample size may not be achieved.
  47. what can result from choosing a value of "k" that is too large?
    the desired sample size may not adequately represent the population it is derived from/representative of.
  48. To find "k" in a systematic sample, what does one do?
    the population size must be approximated. divide the population size by the size of the desired sample. this value is k. "p" is a random number between 1 and "k".
  49. List the five steps for obtaining a systematic sample:
    (1) If possible, approximate the population size

    (2) determine the sample size

    (3) compute "N"/"n" and round down to the nearest integer. This value is "k"

    (4) randomly select a number between 1 and "k". call this number "p".

    (5) the sample will consist of the following individuals: p, p + k, p + 2k,..., p + (n-1)k
  50. If the clusters have __________ individuals, it is better to have _____ _________ with _____ ___________ in each cluster.
    homogenous, more clusters, fewer individuals
  51. When each cluster is _____________, ______ ________ with ____ ___________ in each cluster are appropriate.
    heterogeneous, fewer clusters, more individuals
  52. most large scale surveys obtain samples using a combination of the ______ ______ ________, __________ ________, and _______ _________.
    • simple random sampling
    • systematic sampling
    • cluster sampling
  53. differentiate between stratified and cluster sampling
    • strata is based on similar characteristics
    • cluster is randomly selected collection or group of individuals in a population and selecting all in a cluster to participate in a sample
  54. bias
    • can mean to give preference to selecting some individuals over others (choose men instead of women). can also mean that certain responses are more likely to occur in the sample than in the population. results of the sample are not representative of the population)
    • frames cannot have sampling bias
  55. undercoverage
    results when the proportion of one population segment is lower in a sample than it is in the population
  56. sampling bias
    technique used to obtain the sample's individuals tends to favor one part of the population over another. convenience sampling has this bias frequently.
  57. nonresponse bias
    exists when individuals selected to be in the sample do not respond to the survey and have different opinions from reponders. can occur when individuals selected for the sample do not wish to respond or the interviewer was not able to contact them. can be controlled with callbacks, incentives, or rewards.
  58. response bias
    • exists when the answers on a survey do not reflect the true feelings of the respondent. occurs in a number of ways. includes data-entry error
    • interviewer error: can occur with poorly trained interviewers and if the survey's sponsor has a vested interest in the result
    • misrepresented errors: responses that misrepresent facts or are lies.
    • wording of questions: must be worded and asked in a balanced form
    • open and closed ended questions: in open, the respondent chooses his/her response. a wide variety of responses is to be avoided. in closed, respondents pick from a list of given responses. this is easier to analyze but the respondents choice may not be listed. avoid "no opinion" as an answer option. how many choices should be given?
  59. non-sampling error
    occurs from obtaining and recording the information collected. results from undercoverage, nonresponse bias, response bias, or data entry error. these errors could be present in a census.
  60. sampling error
    occurs because we cannot learn everything about a population from a sample. a sample gives incomplete information about the population. results from using a subset of the population to describe characteristics of the population.
  61. when sampling results in __________ ___________, we mean that the ___________ in the sample ______ reveal ___ ___ ___________ about the __________.
    incomplete information, individuals, cannot, all the information, population