Data Management Definitions

Card Set Information

Data Management Definitions
2014-06-10 20:18:48
Math Gr12

Gr. 12 Data Management definitions
Show Answers:

  1. Permutation
    An arrangement or sequence of events/objects/elements in which order matters. Ex. Timetable, combination lock
  2. Combination
    A grouping or set of events/objects/elements in which order does not matter.Ex. Forming a commitee
  3. The Fundamental Counting Principle
    Used to determine the number of combinations that can be formed with a set of elements.
  4. Principle of Inclusion and Exclusion
    Used to determine the number of elements in two or more sets combined.
  5. Partitions
    Used when two or more items must be together.
  6. Indirect Method
    When a set is too hard to count directly, count its complement and subtract it from the universal set.
  7. Case Method
    For some problems, separate into cases and add cases together.
  8. Set
    A collection of items called elements.
  9. Subset
    A set all of whose elements are contained in the original set.
  10. Universal Set
    An all encompassing set.
  11. Complement
    A set of all elements that are in the universal set but not in set A.
  12. Disjoint or Mutually Exclusive
    When two sets have no elements in common. (Every set and its complement are disjoint)
  13. Union
    Includes all elements of both sets.
  14. Intersection
    Only includes elements common to both sets.
  15. The Empty/Null Set
    A set containing no elements.
  16. Cardinality of a Set
    The number of elements in the set.
  17. Factorial
    Multiplying a series of descending natural numbers.
  18. Permutation Notation
    Used to determine the number of permutations of r items selected from n items.
  19. Identical Elements
    Elements that are exactly the same. We do not count them as distinct elements.
  20. Rule of Sum
    (OR) Count using the Principle of Inclusion and Exclusion or add sets together if they are mutually exclusive.
  21. Rule of Product
    (AND) Count using the Fundamental Counting Principle - Multiply
  22. Probability
    A branch of mathematics that investigates through experiment, calculation, and reasoning the likelihood of specified events. 

    - A measure of the likelihood of an event or outcome.

    - The chance that an event or outcome will occur.
  23. Probability Experiment
    A well defined process consisting of a number of trials in which clearly defined outcomes are observed.
  24. Outcome
    Possible result of a single trial.
  25. Sample Space
    Set of all possible outcomes of an experiment. (S or U)
  26. Trial
    A one time through process of an experiment.
  27. Event
    One or more outcomes (can be grouped).
  28. Subjective Probability
    An estimate based on experience or intuition.
  29. Experimental Probability
    Conduct an experiment with n trials in order to find the probability of an event A.
  30. Theoretical Probability
    Probability is calculated, not experimentally determined. Assumes all outcomes are equally likely.
  31. Statistical Fluctuation
    Results are false or a certain result is exaggerated due to a small number of trials.
  32. Odds
    A comparison of the probabilities of an event occurring to an event not occurring.
  33. Compound Events
    Multiple events occurring.
  34. Independent events
    When the outcome/occurrence of one event does not affect the outcome/occurrence of another.
  35. Dependent Events
    When the outcome/occurrence of one event affects the outcome/occurrence of another.
  36. Mutually Exclusive Events
    Events that cannot occur at the same time (no intersection).
  37. Non-Mutually Exclusive Events
    Events that can occur at the same time (possible intersection).
  38. Probability Distribution
    A distribution of probabilities of all possible outcomes of an experiment.
  39. Random Variable (X)
    Variable that represents all possible outcomes of an experiment.
  40. x
    Individual value of X.
  41. Discrete
    Values are separate and distinct. Finite number in an interval.
  42. Continuous
    Values are all real numbers. Infinite number of values in an interval.
  43. Uniform Probability Distribution
    A distribution of probabilities with equally likely outcomes.
  44. Non-Uniform Probability Distribution
    Not all outcomes have the same probability.
  45. Expected Value
    The "average" outcome.
  46. The Binomial Distribution
    Used for experiments involving repeated trials of independent events which can be classified as success or failure.
  47. Geometric Distribution
    Used to find the probability of x failures before the first success in an experiment. Requires independent events that can be classified as success or failure and repeated trials until the first success.
  48. Hypergeometric Distribution
    A probability distribution for experiments. in which trials are not independent.
  49. Measures of Central Tendency
    Mean, Median, Mode
  50. Mean
    The average value
  51. Median
    The middle value.
  52. Mode
    The most frequent value
  53. Unimodal
    The data only has one peak/mode
  54. Symmetric Distribution
    Values are distributed symmetrically around the mean. The mean, median, and mode are the same.
  55. Distribution Skewed Left
    There are more values on the right side. Goes from left to right: mean, mode, median (Negatively Skewed).
  56. Distribution Skewed Right
    There are more data values on the left side. Goes from left to right:median, mode, mean.(Positively Skewed)
  57. Standard Deviation
    The average distance of a datum from the mean of a data set.
  58. Normal Distribution
    Models continuous data that is distributed unimodally and symmetrically about the mean.
  59. Standard Normal Distribution
    A normal distribution with a mean of 0 and standard deviation of 1.
  60. Continuity Correction
    Altering an interval in order to include a certain value.
  61. Variance
    A measure of dispersion of the data in a data set.
  62. Deviation
    The distance of an individual datum in a data set from the mean.
  63. z-score
    The number of standard deviations a value is away from the mean.
  64. Statistics
    A branch of mathematics that deals with the gathering, organization, analysis, interpretation, and presentation of numerical information.
  65. Raw Data
    The original unprocessed information collected by the researcher.
  66. Categorical Data
    When the variable takes on category types.
  67. Bar Graph
    A graph that measures the frequencies of categorical data.
  68. Histogram
    A graph that measures the frequencies of numerical data.
  69. Frequency Polygon
    A polygon that connects the midpoints at the top of each bar of a histogram.
  70. Culmulative Frequency Diagrams
    Shows the frequency at a value and all of the values below it.
  71. Primary Data
    Original data that is gathered by the researcher.
  72. Secondary Data
    Found data that the researcher uses which was gathered by others.
  73. Population
    An entire group of individuals being studied.
  74. Sample
    A subgroup of the population.
  75. Sampling frame
    The group of individuals that actually have a chance of being chosen for the sample.
  76. Random Sample
    When every individual in the population has an equal chance of being chosen for the sample.
  77. Simple Random Sample
    Sample members are selected from a random simulation.
  78. Systematic Random Sample
    When the researcher goes through the population sequentially and selects members at regular intervals.
  79. Stratified Sampling
    The population is divided into stratums and the number people in each stratum from the sample is proportional to the number of people in each stratum in the population.
  80. Cluster Sampling
    When one or more groups are chosen for the sample that are likely to be a good representation of the population.
  81. Multi Stage Sampling
    Various random samples are done to chose groups and subgroups of a population until arriving at the sample members.
  82. Voluntary Response Sampling
    When the researcher simply invites members of the population to participate in the study.
  83. Convenience Sampling
    A sample is chosen that is easily accessible.
  84. Snowball Sampling
    When a small sample is surveyed, and sample members are asked to pass along the survey to their friends, and them to their friends to get a larger sample.
  85. Judgemental
    When the researcher uses his of her judgement to chose members he or she believes will be appropriate for the study.
  86. Ad Hoc Quotas
    A quota is established and the researcher can choose anyone who fits the quota for the sample.
  87. Bias
    The tendency of a factor to favour certain outcomes or responses which systematically skews results.
  88. Sampling Bias
    The sampling frame is not an accurate representation of the entire population.
  89. Non-Response Bias
    Certain groups are under represented because they choose not to participate in the study.
  90. Measurement Bias
    Data collection method systematically over estimates or underestimates a certain characteristic which skews results. 

    • 1. Environment
    • 2. Leading/Loaded questions.
  91. Response Bias
    Participants deliberately give false or misleading responses which skews results.
  92. Voluntary Response Bias
    Sample members are self selected which over represents those with strong opinions.
  93. Outliers
    Data values that are distant from the majority of the data.
  94. Range
    The difference between the highest data value and the lowest data value. Very sensitive to outliers.
  95. Mean Absolute Value Deviation
    Similar to standard deviation, measures the average distance of a datum from the mean. However MAD is less accurate than standard deviation.
  96. Quartiles
    Values that divide the data into four sections that each have an equal number of data values.
  97. Interquartile Range
    The range of the middle half of the data. Includes 50% of the data around the mean.
  98. Semi-Interquartile Range
    Half of the interquartile range. Includes 25% of data around the mean.
  99. Box and Whisker Plot
    A visual representation of a data set to illustrate quartiles.
  100. Modified Box and Whisker Plot
    A box and whisker plot with outliers. The outliers are plotted outside of the whisker.
  101. Percentiles
    Similar to quartiles but the data is divided into 100 intervals with an equal number of data values in each interval.
  102. Two Variable Statistics
    Provides ways to detect relationships between variables and develop mathematical models for them.
  103. Correlation
    Addressing the relationship between variables.
  104. Linear Correlation
    When the changes in x are proportional to the changes in y.
  105. Positive Linear Correlation
    As x increases, y increases.
  106. Negative linear correlation
    As x increases, y decreases.
  107. Line of Best Fit
    Line on a scatterplot that shows the pattern and direction of the points. Line that is the closest to the plotted points. Passes through as many points as possible with the remaining points grouped evenly above and below the line. Can be used to make predictions about values that are not given or recorded.
  108. Interpolating
    Predicting a value that is within the range of the plotted points.
  109. Extrapolating
    Predicting a value beyond the range of the plotted points.
  110. Strong Correlation
    When the plotted points lie very close to the lbf.
  111. Weak Correlation
    When the points are more dispersed but still form a rough line.
  112. Perfect Correlation
    When the points lie exactly on the lbf.
  113. Covariance
    A measure of the strength of a correlation between two variables. Depends on units.
  114. Correlation Coefficient
    A Quantitative measure of the strength of a linear correlation. How closely the points cluster around the lbf.
  115. Linear Regression
    An analytical technique used to determine the relationship/model/equation between two variables.
  116. Find the linear correlation
    Refers to finding the equation of the lbf.
  117. Perform a linear regression
    Refers to the process of finding the equation of the lbf.
  118. Residual
    The positive or negative vertical distance between a data value and the lbf.
  119. Non-Linear Regression
    An analytical technique for finding the equation of the curve of best fit.
  120. Cause and effect relationship
    A change in x produces a change in y.
  121. Common cause factor
    An external variable changes both variables in the same way.
  122. Reverse cause and effect relationship
    The independent and dependent variables are reversed in the process of establishing causality.
  123. Accidental Relationship
    A correlation exists with no causal relationship. Variables are completely unrelated to each other.
  124. Presumed relationship
    A correlation exists with no causal relationship. Variables are related to each other, but is difficult to find a common cause factor or cause and effect relationship.
  125. Extraneous variables
    External variables that can influence results and the relationship. May influence independent or dependent variable.
  126. Double Blind Study
    Neither the participants nor the researchers know who is in which group.
  127. Coefficient of Determination
    A general measure of how well a specific regression curve fits the data.