Math 1040 Chapter 4

Card Set Information

Author:
bpulsipher
ID:
296331
Filename:
Math 1040 Chapter 4
Updated:
2015-03-12 20:08:04
Tags:
Maw maw Alia alia math 1040 slcc
Folders:

Description:
A list of keyterms and concepts from Chapter 4, Spring 2015.
Show Answers:

Home > Flashcards > Print Preview

The flashcards below were created by user bpulsipher on FreezingBlue Flashcards. What would you like to do?


  1. Define bivariate data:
    data in which two variables are measured on an individual.

    (e.g., we might want to know whether the amount of cola consumed per week is related to a person's bone density. The individuals would be the people in the study, and the two variables would be the amount of cola consumed weekly and bone density.)

    Before we can represent bivariate data graphically, we must decide which variable will be used to predict the value of the other variable.
  2. What is the response variable?
    Dependent variable; is the variable whose value can be explained by the value of the explanatory (predictor or independent) variable.

    (dependent on the independent variable)
  3. What is a scatter diagram? What are it's axes?
    a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram.

    The explanatory variable is plotted on the horizontal (x) axis, and the response variable is plotted on the vertical (y) axis.
  4. How can one tell when two variables have a positive association? A negative association?
    two variables are positively associated if, whenever the value of one variable increases, the value of the other variable also increases (directly related).

    two variables are negatively associated if, whenever the value of one variable increases, the value of the other variable decreases (inversely related).
  5. Just as we can __________ the scale of graphs of univariate data, we can also __________ the scale of graphs of _________ data, possibly resulting in _________ conclusions.

    Therefore, _________ summaries of _________ data should be used __ _________ __ graphs to determine ___ ________ that exists between two variables.
    manipulate, manipulate, bivariate, incorrect

    numerical, bivariate, in addition to, any relation
  6. What is the linear correlation coefficient? The formula?
    a measure of the strength and direction of the linear relation between two quantitative variables.



     is the sample stand. dev. for the explanatory (independent) variable

     is the sample mean of the response (dependent) variable

     is the sampel stand. dev. for the response (dependent) variable

     is the number of individuals in the sample
  7. what is the value of a linear correlation coefficient for a straight line with a positive slope? with a negative slope?

    Does a correlation coefficient close to "zero" imply that there is no relation. Why or why not?
    roughly 1.0 (positive), roughly -1.0 (negative)

    • No, it just implies that there is no linear
    • relation.

    the closer the data are to a correlation of 1 or -1, the closer the data values are to falling on a perfectly straight line.
  8. What does a quadratic (U-shaped) relation on a scatter plot mean? What value would a quadratic (U-shaped) relation likely be close to? Does this mean the values on a scatter plot with a quadratic relation have no relation?
    It means that the data lie in a "U" shape on the scatter plot.

    Zero.

    No, it means that the values may have a relation but have little or no linear relation.
  9. T or F

    The correlation coefficient only applies to data that is linearly related.
    True
  10. T or F

    The linear correlation coefficient is a resistant measure of linear association.
    False, the linear correlation coefficient is not a resistant measure of linear association
  11. List the three steps for testing a linear relation:
    (1) Determine the absolute value of the correlation coefficient.

    (2) Find the critical value in the data set for the given sample size.

    (3) If the absolute value of the correlation coefficient (r) is greater than the critical value, we say that a linear relation exists between the two variables. Otherwise, no linear relation exists.
  12. T or F (ch.1 recap)

    If data used in a study are observational, we can conclude that the two correlated variables have a causal relationship.
    F, we cannot conclude that the two correlated variables have a causal relationship.
  13. Is there another way two variables can be correlated without a causal relationship existing? If not or if so, explain.
    Yes, through a lurking variable.

    Recall that a lurking variable is an explanatory variable that was not considered in the study, but affects the response variable.

    Example: an increase in ice cream sales at the beach causes an increase in shark attacks. Not necessarily. The lurking variable may be higher temperatures in the summer. This may cause more people to go to the beach to cool down (where there are sharks in the water) and eat ice cream while there.
  14. T or F

    Correlation implies causation.
    False, correlation does not imply causation
  15. 4.2
    What is the residual for an observation?
    The difference between the observed value of y and predicted value of y. This is the error, or residual. The smaller the residual, the better the prediction.

    It can also be thought of as the distance between the vertical line (y=mx+b) and points that fall along or near that line on a scatter plot.
  16. 4.2
    What is the least-squares regression line?
    It is a line that minimizes the sum of the squared residuals (or errors or vertical distance) between the observed values of y and those predicted by the line,  (y-hat). We represent this as



    this is the line that best describes the relation between two variables and it is based on residuals (errors).
  17. 4.2
    T or F

    The advantage of the least-squares criterion is that it does not allow for statistical inference on the predicted value and slope.
    False, it does allow for statistical inference on the predicted value and slope.
  18. 4.2
    What is the formula for the least-squares regression line? What is the formula for the slope of the least-squares regression line? What is the formula for the y-intercept of this line?


     since  is the slope of this line

     since  is the y-intercept

    r is the linear coefficient

     is the sample mean and  is the sample st. dev. of the independent (explanatory) variable.

     is the sample mean and  is the sample st. dev. of the dependent (response) variable
  19. 4.2
    T or F

    The least-squares regression line  always contains the point 
    True
  20. 4.2
    T or F

    Because  and  must both be negative, the sign of the linear correlation coefficient, r, and the sign of the slope of the least-squares regression line, , are never the same.
    False, Because s sub X and s sub y must both be positive, the sign of the linear correlation coefficient, r, and the sign of the slope of the least-squares line, b sub 1, are the same.
  21. 4.2
    T or F

    The predicted value of y, , has an interesting interpretation. It is an estimate of the mean value of the explanatory variable for any value of the explanatory variable.
    False, response
  22. 4.2
    We round the (slope/x-intercept) to (four/three/two/one) decimal places. We round predictions to (one/two/three/none) more decimal place(s) than the raw data of the response (dependent) variable. This is the (same/different) rounding rule we use for the mean.
    slope, four, one, same
  23. 4.2
    What is the difference in interpreting the slope for a normal regression line (y=mx+b) and the slope of the least-squares regression line ?
    Interpreting slope for least-squares regression lines has a minor twist. Statistical models such as a least-squares regression equation are probabilistic. This means that any predictions or interpretations made as a result of the model are based on uncertainty.

    Therefore, when we interpret the slope of a least-squares regression equation, we do not want to imply that there is 100% certainty behind the interpretation.

    This is because we are talking about averages with the least-squares regression equation. The algebraic regression line y=mx+b does is not average based and can therefore say that if one variable (x) increases by one (or more), then the y variable will increase or decrease by (whatever number the slope indicates). This cannot be said with the least-squares regression line.

    With the least-squares regression line, a change in x can mean a change in y only "on average" or as "expected".
  24. 4.2
    The y-intercept on a graph is the point where...
    the graph intersects with the vertical axis.
  25. 4.2
    When interpreting the y-intercept:

    In general, we interpret a y-intercept as being the value of the response variable when...
    the value of the explanatory variable is "zero". It is found by letting "x" equal zero and solving for y.
  26. 4.2
    When interpreting the y-intercept, there are two conditions. The first:

    In order to interpret the y-intercept, we must first ask two questions. What are they?
    (1) Is 0 a reasonable value for the explanatory variable?

    (2) Do any observations near x=0 exist in the data set?

    If the answer to either question is "no", then we do not interpret (or find) the y-intercept.
  27. 4.2

    When interpreting the y-intercept, there are two conditions. The second:
    We should not use the regression model to make predictions outside the scope of the model.

    This means that we should not use the regression model to make predictions for values of the explanatory variable that are much larger or much smaller than those observed.

    This is a dangerous practice because we cannot be certain of the behavior of data for which we have no observations.
  28. 4.2

    When the correlation coefficient indicates (some kind of a/no) linear relation between the ___________ and ________ variables and the scatter diagram indicates (a/no) relation between the variables, then we use the (median/mean/mode) value of the response variable as the _________ value so that .
    no, explanatory (independent), response (dependent), no, mean, predicted
  29. 4.2

    Recall that the least-squares regression line minimizes the sum of the squared residuals. This means that...
    the sum of the squared residuals, , is smaller for the least-squares line than for any other line that may describe the relation between the two variables.
  30. 4.2

    T or F

    When the correlation coefficient indicates no linear relation between the explanatory and response variables and the scatter diagram indicates no relation between the variables, we use the median value of the of the response variable as the predicted value so the 
    False, mean value
  31. 4.3

    What does does the coefficient of determination, R2, measure?
    R2 measures the percentage of total variation in the response variable that is explained by the least-squares regression line.
  32. 4.3

    The coefficient of determination is a number between 0 and 1, inclusive. What does it mean if R2 = 0? If R2 = 1?
    If R2 = 0, the least-squares regression line has no explanatory value.

    If R2 = 1, the least-squares regression line explains 100% of the variance in the response variable.
  33. What is deviation?
    The difference between predicted values and the actual values. This is due to factors other than the the explanatory variable and random error.
  34. What is total deviation? Explained deviation? Unexplained deviation? Give the formula for calculating each and for the total devation.
    The deviation between the observed value, y, and mean value, , of the response variable.



    The deviation between the predicted value, , and mean value, , of the response variable.



    The deviation between the observed value, y, and predicted value, , of the response variable.



    total deviation = unexplained deviation + explained deviation
  35. 4.3

    The (closer/farther) the observed y's are to the regression line (the predicted y's), the (larger/smaller) R2 will be.

    How can this be described in mathematical terms?
    closer, larger

    It makes sense then that then farther the observed y's are to the regression line (the predictes y's), the smaller R2 will be.

  36. 4.3

    Unexplained variation is found by _______ the _______ of the _________, [formula].
    summing, squares, residuals, 

    A large value (close to 1 or 100%) of R2 implies that the unexplained variation is a small portion of the total variation. Conversely, a large value of R2 implies that explained variation is a large portion of the total variation.

    Remember that this coefficient is based on the variation of the response variable, not the explanatory variable.
  37. 4.3

    T or F

    Squaring the linear correlation coefficient to obtain the coefficient of determination works only for the least-squares regression model with more than one explanatory variable.
    False, with one explanatory variable
  38. 4.3

    What are the three purposes for which we analyze residuals?
    (1) To determine whether a linear model is appropriate to describe the relation between the explanatory and response variables.

    (2) To determine whether the variance of the residuals is constant.

    (3) To check for outliers.
  39. 4.3

    What is a residual plot? What is the purpose of a residual plot? What is unexplained variation?
    A "scatter plot" where the explanatory variable is plotted on the (x) horizontal axis and the corresponding residual is on the (y) vertical axis.

    The purpose of a residual plot is to analyze the unexplained variation present in a data sample (not the explained or total variation).

    Unexplained variation (SSE - sum of squares error, or SSR - sum of squared residuals) is a measure of the difference between the observed value, , and the predicted value, .
  40. 4.3

    T or F

    If a plot of the residuals against the explanatory variable shows a discernable pattern, such as a curve, the response and explanatory variable may be linearly related.
    False, may not be related
  41. 4.3

    It is important for the residuals to have constant error variance because...
    if data is used to make predictions, we want all of our values of the explanatory variable to maintain constant confidence in our predictions rather than fluctuate in some way or at one or more points.
  42. 4.3

    What is an influential observation?
    An observation for a statistical calculation whose deletion from the dataset would noticeably change the result of the calculation.

    It significantly affects the least-squares regression line's slope and/or y-intercept. (It also affects the value of the correlation coefficient.)

    If the value of the correlation coefficient (r) changes significantly, but the slope and the y-intercept do not, then do not consider the point influential.
  43. 4.3

    What the relative vertical position of an observation?
    Residual
  44. 4.3

    What do we call the relative horizontal position of an observation?
    Leverage

    Leverage is a measure that depends on how much the observation's value of the explanatory variable differs from the mean value of the explanatory variable.
  45. 4.3

    T or F

    As with outliers, influential observations should never be removed.
    False, as with outliers, influential observations should be removed only if there is justification to do so.

    When an influential observation occurs in a data set and its removal is not warranted, two possible courses of action are to (1) collect more data so that additional points near the influential observation are obtained or (2) use techniques that reduce the influence of the influential observation.
  46. Which of the following conditions below might indicate that a linear model would not be appropriate?

    (a) constant error variance
    (b) patterned residuals
    (c) none
    (d) outlier

    Residuals are ________ to _________ whether a linear model is ___________ to describe the ________ between the ___________ and ________ variables, to determine whether the ________ of the _________ is ________, and to check for ________.

    To determine if a ______ model is ___________, a ________ plot is used, which is a _______ _______ with the _________ on the ________ axis and the ___________ variable on the __________ axis.
    none

    analyzed, determine, appropriate, relation, explanatory (independent), response (dependent), variance, residuals, constant, outliers

    linear, appropriate, residual, scatter diagram, residuals, vertical (Y), explanatory (independent), horizontal (X)
  47. The proportion of the variability explained by the relation between the explanatory (independent) and response (dependent) variable is measured by __.
    R2

What would you like to do?

Home > Flashcards > Print Preview