Test #1 Review (econ261)

The flashcards below were created by user shesddeevl on FreezingBlue Flashcards.

  1. Individuals
    It may be people, animals, or things.  For each individual, the data give values for one or more variables.
  2. Variables
    Describes some characteristics of an individual, such as a person's height, sex, or salary.

    Variables are categorical and others are quantitative.
  3. Categorical variable or Qualitative variable
    Places each individual into a category, such as male or female.

    Cannot do arithmetic.
  4. Quantitative variable
    It has numerical values that measure some characteristic of each individual, such as height in centimeters or salary in dollars.
  5. Exploratory data analysis
    It uses graphs and numerical summaries to describe the variables in a data set and the relations among them.
  6. Plot your data
    This is almost always the first thing to do after you understand the background of your data (individuals, variables, units of measurement).
  7. Distribution of a variable
    Describes what values the variable takes and how often it takes these values.
  8. Pie charts and bar graphs
    Display the distribution of a categorical variable.

    Bar graphs can compare any set of quantities measured in the same units.
  9. Histograms
    They graph the distribution of a quantitative variable.
  10. Measures of Central Tendency
    Mean, median, mode, and outlier
  11. Measures of spread
    • Range: max-min
    • Standard Deviation
    • Interquartile Range
  12. Overall pattern and notable deviations
    Shape, center, and spread.  Some shapes, such as symmetric or skewed.
  13. Outliers
    Observations that lie outside the overall pattern of a distribution.
  14. Numerical summary of a distribution
    Report at least its center and its spread or variability.
  15. Mean (x-bar)
    Arithmetic average of the observations.
  16. Median, M
    The mid-point of the values.
  17. Quartiles
    When you use the median to indicate the center of the distribution, describe the spread by giving this.
  18. First quartile Q1
    It has one-fourth of the observations below it.
  19. Third quartile Q3
    Three-fourths of the observations of the observations below it.
  20. Five-Number Summary
    Consists of the median, the quartiles, and the smallest and largest individual observations provides a quick overall description of a distribution.

    The median describes the center, and the quartiles and extremes show the spread.

    A better description for skewed distributions.
  21. Boxplots
    Based on the five-number summary are useful for comparing several distributions.  The box spans the quartiles and shows the spread of the central half of the distribution.  The median is marked within the box.  Lines extend from the box to the extremes and show the full spread of the data.
  22. Variance (s^2) and Standard deviation
    Common measures of spread about the mean as center.

    The standard deviation s is zero when there is no spread and gets larger as the spread increases.
  23. Symmetric Distribution
    Use the mean and standard deviation to describe it.
  24. Resistant measure
    Relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, no matter how large these changes are.

    The median and quartiles are resistant, but the mean and the standard deviation are not.
  25. Skewed
    Use the median and quartiles; box-plot; 5 Number Summary.
  26. Mean and standard deviation
    Symmetric distributions without outliers.

    Non-resistant measure.
  27. Density curve
    It has a total area 1 underneath it.

    An area under a density curve gives the proportion of observations that fall in a range of values.

    An idealized description of the overall pattern of a distribution that smooths out the irregularities in the actual data.
  28. Normal Density Curve also called Normal distributions
    They are symmetrical; Normal distribution with shapes:

    Bell shaped, single peaked, or symmetrical
  29. 68-95-99.7 Rule
    Describes what percent of observations lie within one, two, and three standard deviations of the mean.
  30. Explanatory variable
  31. Response variable
  32. Scatterplot
    Displays the relationship between two quantitative variables measured on the same individuals.

    Plot points with different colors or symbols to see the effect of a categorical variable in the scatterplot.
  33. Overall pattern of Scatterplot
    The direction (positive or negative), form (linear relationship or clusters), and strength (how close points lie to form a line) of the relationship and then for outliers or other deviations from this pattern.
  34. Correlation r
    Measures the direction and strength of the linear association between two quantitative variables x and y.

    Must be quantitative, ONLY linear, it is NOT resistant.
  35. Regression line
    Straight line that describes how a response variable y changes as an explanatory variable x changes.

    Use this to predict the value of y for any value of x by substituting this x into the equation of the line.
  36. Slope b
    y-hat = a + bx; the predicted response y-hat changes along the line as the explanatory variable x changes.
  37. Intercept a
    y-hat = a + bx; predicted response y-hat when the explanatory variable x=0.
  38. Least-Squares Regression Line
    Straight line y-hat = a +bx that minimizes the sum of the squares of the vertical distances of the observed points from the line.

    Line always passes through the point (x-mean, y-mean).
  39. Square of the correlation (r^2)
    The fraction of the variation in one variable that is explained by least-squares on the other variable.
  40. Influential observations
    Individual points that substantially change the correlation or the regression line.  Outliers in the x direction are often influential for the regression line.
  41. Ecological correlation
    Tendency for correlations based on average to be stronger than correlations based on individuals.
  42. Extrapolation
    The use of a regression line for prediction for values of the explanatory variable far outside the range of the data from which the line was calculated.
  43. Lurking variables
    May explain the relationship between the explanatory and response variables.
Card Set
Test #1 Review (econ261)
Study for Econ261
Show Answers