The flashcards below were created by user
shesddeevl
on FreezingBlue Flashcards.

Individuals
It may be people, animals, or things. For each individual, the data give values for one or more variables.

Variables
Describes some characteristics of an individual, such as a person's height, sex, or salary.
Variables are categorical and others are quantitative.

Categorical variable or Qualitative variable
Places each individual into a category, such as male or female.
Cannot do arithmetic.

Quantitative variable
It has numerical values that measure some characteristic of each individual, such as height in centimeters or salary in dollars.

Exploratory data analysis
It uses graphs and numerical summaries to describe the variables in a data set and the relations among them.

Plot your data
This is almost always the first thing to do after you understand the background of your data (individuals, variables, units of measurement).

Distribution of a variable
Describes what values the variable takes and how often it takes these values.

Pie charts and bar graphs
Display the distribution of a categorical variable.
Bar graphs can compare any set of quantities measured in the same units.

Histograms
They graph the distribution of a quantitative variable.

Measures of Central Tendency
Mean, median, mode, and outlier

Measures of spread
 Range: maxmin
 Standard Deviation
 Interquartile Range

Overall pattern and notable deviations
Shape, center, and spread. Some shapes, such as symmetric or skewed.

Outliers
Observations that lie outside the overall pattern of a distribution.

Numerical summary of a distribution
Report at least its center and its spread or variability.

Mean (xbar)
Arithmetic average of the observations.

Median, M
The midpoint of the values.

Quartiles
When you use the median to indicate the center of the distribution, describe the spread by giving this.

First quartile Q1
It has onefourth of the observations below it.

Third quartile Q3
Threefourths of the observations of the observations below it.

FiveNumber Summary
Consists of the median, the quartiles, and the smallest and largest individual observations provides a quick overall description of a distribution.
The median describes the center, and the quartiles and extremes show the spread.
A better description for skewed distributions.

Boxplots
Based on the fivenumber summary are useful for comparing several distributions. The box spans the quartiles and shows the spread of the central half of the distribution. The median is marked within the box. Lines extend from the box to the extremes and show the full spread of the data.

Variance (s^2) and Standard deviation
Common measures of spread about the mean as center.
The standard deviation s is zero when there is no spread and gets larger as the spread increases.

Symmetric Distribution
Use the mean and standard deviation to describe it.

Resistant measure
Relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, no matter how large these changes are.
The median and quartiles are resistant, but the mean and the standard deviation are not.

Skewed
Use the median and quartiles; boxplot; 5 Number Summary.

Mean and standard deviation
Symmetric distributions without outliers.
Nonresistant measure.

Density curve
It has a total area 1 underneath it.
An area under a density curve gives the proportion of observations that fall in a range of values.
An idealized description of the overall pattern of a distribution that smooths out the irregularities in the actual data.

Normal Density Curve also called Normal distributions
They are symmetrical; Normal distribution with shapes:
Bell shaped, single peaked, or symmetrical

689599.7 Rule
Describes what percent of observations lie within one, two, and three standard deviations of the mean.

Explanatory variable
xaxis


Scatterplot
Displays the relationship between two quantitative variables measured on the same individuals.
Plot points with different colors or symbols to see the effect of a categorical variable in the scatterplot.

Overall pattern of Scatterplot
The direction (positive or negative), form (linear relationship or clusters), and strength (how close points lie to form a line) of the relationship and then for outliers or other deviations from this pattern.

Correlation r
Measures the direction and strength of the linear association between two quantitative variables x and y.
Must be quantitative, ONLY linear, it is NOT resistant.

Regression line
Straight line that describes how a response variable y changes as an explanatory variable x changes.
Use this to predict the value of y for any value of x by substituting this x into the equation of the line.

Slope b
yhat = a + bx; the predicted response yhat changes along the line as the explanatory variable x changes.

Intercept a
yhat = a + bx; predicted response yhat when the explanatory variable x=0.

LeastSquares Regression Line
Straight line yhat = a +bx that minimizes the sum of the squares of the vertical distances of the observed points from the line.
Line always passes through the point (xmean, ymean).

Square of the correlation (r^2)
The fraction of the variation in one variable that is explained by leastsquares on the other variable.

Influential observations
Individual points that substantially change the correlation or the regression line. Outliers in the x direction are often influential for the regression line.

Ecological correlation
Tendency for correlations based on average to be stronger than correlations based on individuals.

Extrapolation
The use of a regression line for prediction for values of the explanatory variable far outside the range of the data from which the line was calculated.

Lurking variables
May explain the relationship between the explanatory and response variables.

