Home > Preview
The flashcards below were created by user
SusanneS28
on FreezingBlue Flashcards.

Data
 Consist of information coming from
 observations, counts, measurements, or
 responses. ex “People who eat three daily servings of whole grains have been
 shown to reduce their risk of…stroke by 37%.”

Statistics
 The
 science of collecting, organizing, analyzing, and interpreting data in order to
 make decisions.

Data Sets
 Population
 The collection of all outcomes, responses,
 measurements, or counts that are of interest.
 Sample

A subset of the population.

Sample
A subset of the population.

Population
 The collection of all outcomes, responses,
 measurements, or counts that are of interest.

Parameter
 A number
 that describes a population characteristic.
 Average age of all people in the United States

Statistic
 A number
 that describes a sample
 characteristic.

Average age of people
from a sample of three states

Branches of Statistics
Descriptive Statistics Inferential Statistics

Descriptive Statistics
 Involves
 organizing, summarizing, and displaying data.
 e.g. Tables, charts, averages

Inferential Statistics
Involves using sample data to draw conclusions about a population.


Qualitative Data
 Consists
 of attributes, labels, or nonnumerical entries.

Quantitative data
Numerical measurements or counts

Designing a Statistical Study
 Identify
 the variable(s) of interest (the focus) and the population of the study.
 Develop a
 detailed plan for collecting data. If you use a sample, make sure the sample is
 representative of the population.
 Collect
 the data.
 Describe
 the data using descriptive statistics techniques.
 Interpret
 the data and make decisions about the population using inferential statistics.
 Identify
 any possible errors.

Observational study
 •A
 researcher observes and measures characteristics of interest of part of a
 population.
 •Researchers
 observed and recorded the mouthing behavior on nonfood objects of children up
 to three years old. (Source: Pediatric
 Magazine)

Experiment
 •A
 treatment is applied to part of a population and responses are observed.

Simulation
 •Uses a
 mathematical or physical model to reproduce the conditions of a situation or
 process.
 •Often
 involves the use of computers.
 •Automobile
 manufacturers use simulations with dummies to study the effects of crashes on
 humans.

Control
 for effects other than the one being
 measured.

•Confounding variables
 §Occurs
 when an experimenter cannot tell the difference between the effects of
 different factors on a variable.
 §A coffee
 shop owner remodels her shop at the same time a nearby mall has its grand
 opening. If business at the coffee shop increases, it cannot be determined
 whether it is because of the remodeling or the new mall.

•Placebo effect
 §A subject
 reacts favorably to a placebo when in fact he or she has been given no medical
 treatment at all.
 §Blinding is a technique where the subject does
 not know whether he or she is receiving a treatment or a placebo.
 §Doubleblind experiment neither the subject nor the
 experimenter knows if the subject is receiving a treatment or a placebo.

Simple Random Sample
 Every
 possible sample of the same size has the same chance of being selected.

Stratified Sample
 •Divide a
 population into groups (strata) and select a random sample from each group.

Cluster Sample
 •Divide
 the population into groups (clusters) and select all of the members in one or
 more, but not all, of the clusters.

Systematic Sample
 •Choose a
 starting value at random. Then choose every kth member of the population.

nominal level of measurement
 A
 variable is at the nominal level of measurement if the
 values of the variable name, label, or categorize. In addition, the naming scheme does not allow
 for the values of the variable to be arranged in a ranked, or specific, order.

ordinal level of measurement
 A
 variable is at the ordinal level of measurement if it
 has the properties of the nominal level of measurement and the naming scheme
 allows for the values of the variable to be arranged in a ranked, or specific,
 order.

interval level of measurement
 A
 variable is at the interval level of measurement if it
 has the properties of the ordinal level of measurement and the differences in
 the values of the variable have meaning.
 A value of zero in the interval level of measurement does not mean the
 absence of the quantity. Arithmetic
 operations such as addition and subtraction can be performed on values of the
 variable.

ratio level of measurement
 A
 variable is at the ratio level of measurement if it
 has the properties of the interval level of measurement and the ratios of the
 values of the variable have meaning. A
 value of zero in the ratio level of measurement means the absence of the
 quantity. Arithmetic operations such as
 multiplication and division can be performed on the values of the variable.

Confounding
 in a study occurs when the effects of two or more
 explanatory variables are not separated.
 Therefore, any relation that may exist between an explanatory variable
 and the response variable may be due to some other variable or variables not
 accounted for in the study.

lurking variable
 A lurking variable is an explanatory
 variable that was not considered in a study, but that affect the value of the
 response variable in the study. In
 addition, lurking variables are typically related to any explanatory variables
 considered in the study.

Crosssectional
Studies
 Observational studies that collect information about
 individuals at a specific point in time, or over a very short period of time.

Casecontrol Studies
 These studies are retrospective,
 meaning that they require individuals to look back in time or require the
 researcher to look at existing records.
 In casecontrol studies, individuals that have certain characteristics
 are matched with those that do not.

Cohort Studies
 A cohort study first identifies a group of individuals
 to participate in the study (cohort).
 The cohort is then observed over a period of time. Over this time
 period, characteristics about the individuals are recorded. Because the data is collected over time,
 cohort studies are prospective.

census
 A census is a
 list of all individuals in a population along with certain characteristics of
 each individual.

Random sampling
 is the process of using chance to select individuals
 from a population to be included in the sample.

simple random sampling
 A sample of size n from a
 population of size N is obtained through simple random sampling if every possible
 sample of size n has an equally
 likely chance of occurring. The sample
 is then called a simple random sample.

The 110th Congress of the United States had 435 members
in the House of Representatives. Explain how to conduct a simple random sample
of 5 members to attend a Presidential luncheon.
Then obtain the sample.
Put the members in alphabetical order. Number the members from 1  435.

stratified sample
 A stratified sample is one
 obtained by separating the population into homogeneous, nonoverlapping groups
 called strata, and then obtaining
 a simple random sample from each stratum.

A systematic sample
 A systematic sample is
 obtained by selecting every kth individual from the population. The first individual
 selected is a random number between 1 and k.

cluster sample
 A cluster sample is
 obtained by selecting all individuals within a randomly selected collection or
 group of individuals.

convenience sample
 A convenience sample is one
 in which the individuals in the sample are easily obtained.
 Any studies that use this type of
 sampling generally have results that are suspect. Results should be looked upon with extreme
 skepticism.

Bias
 If the
 results of the sample are not representative of the population, then the sample
 has bias.
 Three Sources of Bias




Sampling bias
 means that the technique used to obtain the individuals
 to be in the sample tend to favor one part of the population over another.

Undercoverage
 Undercoverage is a
 type of sampling bias. Undercoverage occurs when the proportion of one segment of the
 population is lower in a sample than it is in the population.

Nonresponse bias
 Nonresponse bias exists when
 individuals selected to be in the sample who do not respond to the survey have
 different opinions from those who do.

Response bias
 exists when the answers on a survey do not reflect the
 true feelings of the respondent.
 Types of Response Bias
 Interviewer error
 Misrepresented answers
 Words used in survey question
 Order of the questions or words within the question

Nonsampling errors
 are errors that result from sampling bias, nonresponse bias,
 response bias, or dataentry error. Such
 errors could also be present in a complete census of the population.

Sampling error
 is error that results from using a sample to estimate
 information about a population. This type of error occurs because a sample
 gives incomplete information about a population.

raw data
 When data is collected from a survey or designed
 experiment, they must be organized into a manageable form. Data that is not organized is referred to as raw data.

frequency distribution
 A frequency distribution lists
 each category of data and the number of occurrences for each category of data.

relative frequency
 The relative frequency is the
 proportion (or percent) of observations within a category and is found using
 the formula:

 A relative frequency distribution lists the
 relative frequency of each category of data.

bar graph
 A bar graph is constructed by labeling each category of data on
 either the horizontal or vertical axis and the frequency or relative frequency
 of the category on the other axis.

Pareto chart
 A Pareto chart is a bar graph where
 the bars are drawn in decreasing order of frequency or relative frequency

Difference between discrete and continuos data
 The first step in
 summarizing quantitative data is to determine whether the data is discrete or
 continuous. If the data is discrete and
 there are relatively few different values of the variable, the categories of data
 will be the observations (as in qualitative data). If the data is discrete, but
 there are many different values of the variable, or if the data is continuous,
 the categories of data (called classes)
 must be created using intervals of numbers.

histogram
 A histogram is constructed by
 drawing rectangles for each class of data whose height is the frequency or
 relative frequency of the class. The
 width of each rectangle should be the same and they should touch each other.

stemandleaf plot
 A stemandleaf
 plot uses digits to the
 left of the rightmost digit to form the stem. Each rightmost digit
 forms a leaf.
 For example, a data
 value of 147 would have 14 as the stem and 7 as the leaf.

dot plot
 A dot
 plot is drawn by placing
 each observation horizontally in increasing order and placing a dot above the
 observation each time it is observed.

class midpoint
 The class
 midpoint is found by adding
 consecutive lower class limits and dividing the result by 2.

frequency polygon
 A frequency
 polygon is drawn by plotting
 a point above each class midpoint on a horizontal axis at a height equal to the
 frequency of the class. After the points
 for each class are plotted, draw straight lines between consecutive points.

cumulative frequency distribution
 A cumulative
 frequency distribution displays
 the aggregate frequency of the category.
 In other words, for discrete data, it displays the total number of
 observations less than or equal to the category. For continuous data, it displays the total
 number of observations less than or equal to the upper class limit of a class.

cumulative relative frequency distribution
 A cumulative
 relative frequency distribution
 displays the aggregate proportion (or percent) of observations less than or
 equal to the category.

ogive
 An ogive (read as “oh jive”)
 is a graph that represents the cumulative frequency or cumulative relative
 frequency for the class. It is
 constructed by plotting points whose xcoordinates are the upper class limits and whose ycoordinates are the
 cumulative frequencies or cumulative relative frequencies. After the points for each class are plotted,
 draw straight lines between consecutive points.
 An additional line segment is drawn connecting the upper limit of the
 class that would preceed the first class (if
 it existed).

time series data.
 If the value of a
 variable is measured at different points in time, the data is referred to as time
 series data.


