Data cleaning
the review of data for accuracy and completeness
A clean data set contains a group of data that are ready for coding and analysis


•Is a type of graph that shows the frequency of cases for categories of a categorical (discrete) variable such as a Yes/No variable.

Histogram
a similar chart to bar chart, used for continuous variables


A variable that could have an infinite number of values along a continuum.
 Ex. height, weight, and blood sugar level
 Cannot be anwsered with yes/no


Used to display trends, e.g., time trends

Pie Chart
A circle that shows the proportion of cases according to several categories

Ratio
 The value obtained by dividing one quantity by another.
 Examples are:
 Rates
 Proportions
 Percentages

Ratio Calculation
 With respect to AIDS mortality, the sex ratio of deaths (male to female deaths) = X/Y, where:
 ─X= 450,451
 ─Y= 89,895
 The sex ratio = 450,451/ 89,895 = 5 to 1 (approximately).


•A type of ratio in which the numerator is part of the denominator
•May be expressed as a percentage


•A proportion that has been multiplied by 100.
 •The formula is (A/A+B) X 100.
 •From the previous slide:
 –The percentage of male deaths from AIDS was (0.83 X 100) = 83%.


Also a type of ratio, a rate differs from a proportion because the denominator involves a measure of time

General Information Regarding Epidemiologic Measures
 provide the following types of information:
 ─frequency of a disease or condition
 ─associations between exposures and health outcomes
 ─strength of the relationship between an exposure and a health outcome

Quantitative Epidemiologic Measures
 characterize the occurrence of disease, morbidity, and mortality in populations.
 Quantitative terms include: Counts, Incidence, Prevalence

Count
 number of cases of a disease or other health phenomenon being studied.
 • single cases may have public health significance
 ─Case reports of patients with particularly unusual presentations or combinations of symptoms often spur epidemiologic investigations.
 ─Ex one case of Ebola virus

Incidence
occurrence of new disease or mortality within a defined period of observation (time period) in a specified population.

Population at Risk
members of the population who are capable of developing the disease or condition being studied are known as the population at risk.

Incidence Rate
A rate formed by dividing the number of new cases that occur during a time period by the number of individuals in the population at risk

Prevalence
 Number of existing cases of a disease or health condition, or deaths in a population at some designated time.
 •Variations: Point prevalence and Period prevalence/Lifetime prevalence

Point Prevalence
 •All cases of a disease, health condition, or deaths that exist at a particular point in time relative to a specific population from which the cases are derived.
 •Formula:
 Point prevalence = Number of persons ill at a point in time
 Total number in the group

Period Prevalence
All cases of a disease within a period of time

Lifetime Prevalence
 Cases diagnosed at any time during the person’s lifetime
 •Ex: Lifetime asthma diagnosis

Interrelationships Between Incidence and Prevalence
 Factors that cause prevalence to increase:
 ─Increase in incidence
 ─Longer duration of the case
 ─Inmigration of cases
 ─Prolongation of life of patients without a cure
 •Factors that cause prevalence to decrease:
 ─Decrease in incidence
 ─Shorter duration of disease
 ─Inmigration of healthy people
 ─Improved cure rate of disease

Crude Rate
 •A type of rate that has not been modified to take account of any of the factors such as the demographic makeup of the population that may affect the observed rate
 •Includes a time period during which an event occurred.
 •Numerator consists of the frequency of a disease over a specified period of time.
 •Denominator is a unit size of population.Aid in making comparisons but have limitations

Crude Death Rate
 •The crude death rate is a type of crude rate.
 •Can be expressed in terms of any unit size of a population that is convenient.
 ─For example, infant mortality is expressed per 1,000 live births.

Reference Population
 •The population from which cases of a disease have been taken
 •Ex: calculation of the annual crude death rate in the United States

Case Fatality Rate (CFR)
 The number of deaths due to a disease that occur among persons who are afflicted with that disease=
 Number of deaths due to disease “X” Times 100during a time period
 Number of cases of disease “X”

Proportional Mortality Ratio (PMR)
The number of deaths within a population due to a specific disease or cause divided by the total number of deaths in the population

CauseSpecific Rate
 A measure that refers to mortality (or frequency of a given disease) divided by the population size at the midpoint of a time period times a multiplier.
 Mortality (or frequency of a given disease) X 100,000
 Population size at midpoint of time period


•The number of cases per age group of population during a specified time period

SexSpecific Rate
 The frequency of a disease in a gender group divided by the
 total number of persons within that gender group during a
 time period times a multiplier

Adjusted Rate
 •A rate of morbidity or mortality in a population in which statistical procedures have been applied to permit fair comparisons across populations by removing the effect of differences in the composition of various populations
 •Age is a factor used in rate adjustment.

