Distribution of a variable tells us what values the variable takes and how often it takes these values.
What are examples of variables?
Age, Height, Blood pressure, ethnicity, leaf length, first language
What does it mean for a variable to be quantitative?
it is numerical values for which arithmetic operations make sense to use such as adding and subtracting. Example, how many credit cards you have.
What it mean for variables to be categorical?
Something that falls into one of several categories. What can be counted is the count or proportion of individuals in each category.
Example: Hair color, ethnicity, whether you paid your tax income.
How do you know if a variable is categorical or quantitative?
Ask what are the n individuals/units in the sample. Or what is being recorded about those n individuals. In the number quantitative or a statement (categorical)
Example: Patient A dies, Diagnosis is categorical while age is quantitative.
What is a Pareto Chart?
Its when a bar chart is sorted by rank. Much more useful than alphabetically.
What are ways to chart Quantitative data?
These two are useful for single variable, and understanding the pattern of variability of the data.
-Line Graphs: Time plots
Use when there is a meaningful sequence, like time. Change over time.
What is a skyscraper of data in a chart?
Almost all of the data is in one category so the graph doesn’t provide much information or insight.
What is a pancake graph?
Too many categories with just a few observations in each category so again not much information provided
How can we describe a histogram?
With its Shape, Center, and spread.
A smooth curve highlighting the pattern is good! a line connecting the column is too detailed.
What is a symmetric distribution?
A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other.
What is a skewed distribution?
A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side.
What is a complex multimodal distribution?
Not all distributions have a simple overall shape, especially when there are few observations.
What is an Outlier?
Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. A large gap in the distribution is typically a sign of an outlier.
Can you force you data into a shape?
No. Your data are the way they are. Do not try to force them into a particular shape. It is a common misconception that if you have a large enough data set, the data will eventually turn out nice and symmetrical.
Do stemplots work well for large datasets?
When should you use a back to back stemplot?
To compare two related distributions.
Compare Stemplots and histograms
Stemplots are quick and dirty histograms that can easily be done by hand, therefore very convenient for back of the envelope calculations. However, they are rarely found in scientific or laymen publications.
What is a trend?
A trend is a rise or fall that persists over time, despite small irregularities.
What is a seasonal Variation?
A pattern that repeats itself at regular intervals of time is called seasonal variation.
Does Scale matter?
Yes. How you stretch the axes and choose your scales can give a different impression.
Is mean and average the same thing?
What is an average (mean)?
Center of the mass.
What is the median?
midpoint of a distribution. Half the numbers are larger and half are smaller.
When are mean and median the same?
The mean and the median are the same only if the distribution is symmetrical. The median is a measure of center that is resistant to skew and outliers. The mean is not.
What is the first and third quaritile?
The first is 25% of the data and below. Where as third quartile is 75 percent of data and below.
Why are boxplots good?
Boxplots remain true to the data and depict clearly symmetry or skew
What is Standard Deviation?
THE STANDARD DEVIATION IS VERY CLOSE TO THE AVERAGE DISTANCE FROM THE MEAN TO EACH OF THE INDIVIDUAL DATA POINTS
So as the points are more distant from the mean, the standard deviation will be larger
Greater spread (concept) ® Larger standard deviation (measure)
What is a normal density curve?
If the overall pattern of a large number of observations follows a certain shape, then we can approximately describe it with a smooth “Normal” curve with very little loss of
The curve is the result of a mathematical model (or idealized description) of the distribution
The curve does not depend on our choice for the number of classes the way a histogram does
Normal density curves have certain characteristics that make them easy to work with
A density curve describes....
The overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that range.
The median of a density curve is...
The median of a density curve is the equal-areas point, the point that divides the area under the curve in half.
The mean if a density curve is...
The mean of a density curve is the balance point, at which the curve would balance if made of solid material.
What does this notation mean for Normal distribution:
N (9.12, 0.15)
Normal Distribution with mean of 9.12 and standard deviation of 0.15