# Data Management Definitions

The flashcards below were created by user hsu.kaitlyn on FreezingBlue Flashcards.

1. Permutation
An arrangement or sequence of events/objects/elements in which order matters. Ex. Timetable, combination lock
2. Combination
A grouping or set of events/objects/elements in which order does not matter.Ex. Forming a commitee
3. The Fundamental Counting Principle
Used to determine the number of combinations that can be formed with a set of elements.
4. Principle of Inclusion and Exclusion
Used to determine the number of elements in two or more sets combined.
5. Partitions
Used when two or more items must be together.
6. Indirect Method
When a set is too hard to count directly, count its complement and subtract it from the universal set.
7. Case Method
For some problems, separate into cases and add cases together.
8. Set
A collection of items called elements.
9. Subset
A set all of whose elements are contained in the original set.
10. Universal Set
An all encompassing set.
11. Complement
A set of all elements that are in the universal set but not in set A.
12. Disjoint or Mutually Exclusive
When two sets have no elements in common. (Every set and its complement are disjoint)
13. Union
Includes all elements of both sets.
14. Intersection
Only includes elements common to both sets.
15. The Empty/Null Set
A set containing no elements.
16. Cardinality of a Set
The number of elements in the set.
17. Factorial
Multiplying a series of descending natural numbers.
18. Permutation Notation
Used to determine the number of permutations of r items selected from n items.
19. Identical Elements
Elements that are exactly the same. We do not count them as distinct elements.
20. Rule of Sum
(OR) Count using the Principle of Inclusion and Exclusion or add sets together if they are mutually exclusive.
21. Rule of Product
(AND) Count using the Fundamental Counting Principle - Multiply
22. Probability
A branch of mathematics that investigates through experiment, calculation, and reasoning the likelihood of specified events.

- A measure of the likelihood of an event or outcome.

- The chance that an event or outcome will occur.
23. Probability Experiment
A well defined process consisting of a number of trials in which clearly defined outcomes are observed.
24. Outcome
Possible result of a single trial.
25. Sample Space
Set of all possible outcomes of an experiment. (S or U)
26. Trial
A one time through process of an experiment.
27. Event
One or more outcomes (can be grouped).
28. Subjective Probability
An estimate based on experience or intuition.
29. Experimental Probability
Conduct an experiment with n trials in order to find the probability of an event A.
30. Theoretical Probability
Probability is calculated, not experimentally determined. Assumes all outcomes are equally likely.
31. Statistical Fluctuation
Results are false or a certain result is exaggerated due to a small number of trials.
32. Odds
A comparison of the probabilities of an event occurring to an event not occurring.
33. Compound Events
Multiple events occurring.
34. Independent events
When the outcome/occurrence of one event does not affect the outcome/occurrence of another.
35. Dependent Events
When the outcome/occurrence of one event affects the outcome/occurrence of another.
36. Mutually Exclusive Events
Events that cannot occur at the same time (no intersection).
37. Non-Mutually Exclusive Events
Events that can occur at the same time (possible intersection).
38. Probability Distribution
A distribution of probabilities of all possible outcomes of an experiment.
39. Random Variable (X)
Variable that represents all possible outcomes of an experiment.
40. x
Individual value of X.
41. Discrete
Values are separate and distinct. Finite number in an interval.
42. Continuous
Values are all real numbers. Infinite number of values in an interval.
43. Uniform Probability Distribution
A distribution of probabilities with equally likely outcomes.
44. Non-Uniform Probability Distribution
Not all outcomes have the same probability.
45. Expected Value
The "average" outcome.
46. The Binomial Distribution
Used for experiments involving repeated trials of independent events which can be classified as success or failure.
47. Geometric Distribution
Used to find the probability of x failures before the first success in an experiment. Requires independent events that can be classified as success or failure and repeated trials until the first success.
48. Hypergeometric Distribution
A probability distribution for experiments. in which trials are not independent.
49. Measures of Central Tendency
Mean, Median, Mode
50. Mean
The average value
51. Median
The middle value.
52. Mode
The most frequent value
53. Unimodal
The data only has one peak/mode
54. Symmetric Distribution
Values are distributed symmetrically around the mean. The mean, median, and mode are the same.
55. Distribution Skewed Left
There are more values on the right side. Goes from left to right: mean, mode, median (Negatively Skewed).
56. Distribution Skewed Right
There are more data values on the left side. Goes from left to right:median, mode, mean.(Positively Skewed)
57. Standard Deviation
The average distance of a datum from the mean of a data set.
58. Normal Distribution
Models continuous data that is distributed unimodally and symmetrically about the mean.
59. Standard Normal Distribution
A normal distribution with a mean of 0 and standard deviation of 1.
60. Continuity Correction
Altering an interval in order to include a certain value.
61. Variance
A measure of dispersion of the data in a data set.
62. Deviation
The distance of an individual datum in a data set from the mean.
63. z-score
The number of standard deviations a value is away from the mean.
64. Statistics
A branch of mathematics that deals with the gathering, organization, analysis, interpretation, and presentation of numerical information.
65. Raw Data
The original unprocessed information collected by the researcher.
66. Categorical Data
When the variable takes on category types.
67. Bar Graph
A graph that measures the frequencies of categorical data.
68. Histogram
A graph that measures the frequencies of numerical data.
69. Frequency Polygon
A polygon that connects the midpoints at the top of each bar of a histogram.
70. Culmulative Frequency Diagrams
Shows the frequency at a value and all of the values below it.
71. Primary Data
Original data that is gathered by the researcher.
72. Secondary Data
Found data that the researcher uses which was gathered by others.
73. Population
An entire group of individuals being studied.
74. Sample
A subgroup of the population.
75. Sampling frame
The group of individuals that actually have a chance of being chosen for the sample.
76. Random Sample
When every individual in the population has an equal chance of being chosen for the sample.
77. Simple Random Sample
Sample members are selected from a random simulation.
78. Systematic Random Sample
When the researcher goes through the population sequentially and selects members at regular intervals.
79. Stratified Sampling
The population is divided into stratums and the number people in each stratum from the sample is proportional to the number of people in each stratum in the population.
80. Cluster Sampling
When one or more groups are chosen for the sample that are likely to be a good representation of the population.
81. Multi Stage Sampling
Various random samples are done to chose groups and subgroups of a population until arriving at the sample members.
82. Voluntary Response Sampling
When the researcher simply invites members of the population to participate in the study.
83. Convenience Sampling
A sample is chosen that is easily accessible.
84. Snowball Sampling
When a small sample is surveyed, and sample members are asked to pass along the survey to their friends, and them to their friends to get a larger sample.
85. Judgemental
When the researcher uses his of her judgement to chose members he or she believes will be appropriate for the study.
A quota is established and the researcher can choose anyone who fits the quota for the sample.
87. Bias
The tendency of a factor to favour certain outcomes or responses which systematically skews results.
88. Sampling Bias
The sampling frame is not an accurate representation of the entire population.
89. Non-Response Bias
Certain groups are under represented because they choose not to participate in the study.
90. Measurement Bias
Data collection method systematically over estimates or underestimates a certain characteristic which skews results.

• 1. Environment
91. Response Bias
Participants deliberately give false or misleading responses which skews results.
92. Voluntary Response Bias
Sample members are self selected which over represents those with strong opinions.
93. Outliers
Data values that are distant from the majority of the data.
94. Range
The difference between the highest data value and the lowest data value. Very sensitive to outliers.
95. Mean Absolute Value Deviation
Similar to standard deviation, measures the average distance of a datum from the mean. However MAD is less accurate than standard deviation.
96. Quartiles
Values that divide the data into four sections that each have an equal number of data values.
97. Interquartile Range
The range of the middle half of the data. Includes 50% of the data around the mean.
98. Semi-Interquartile Range
Half of the interquartile range. Includes 25% of data around the mean.
99. Box and Whisker Plot
A visual representation of a data set to illustrate quartiles.
100. Modified Box and Whisker Plot
A box and whisker plot with outliers. The outliers are plotted outside of the whisker.
101. Percentiles
Similar to quartiles but the data is divided into 100 intervals with an equal number of data values in each interval.
102. Two Variable Statistics
Provides ways to detect relationships between variables and develop mathematical models for them.
103. Correlation
104. Linear Correlation
When the changes in x are proportional to the changes in y.
105. Positive Linear Correlation
As x increases, y increases.
106. Negative linear correlation
As x increases, y decreases.
107. Line of Best Fit
Line on a scatterplot that shows the pattern and direction of the points. Line that is the closest to the plotted points. Passes through as many points as possible with the remaining points grouped evenly above and below the line. Can be used to make predictions about values that are not given or recorded.
108. Interpolating
Predicting a value that is within the range of the plotted points.
109. Extrapolating
Predicting a value beyond the range of the plotted points.
110. Strong Correlation
When the plotted points lie very close to the lbf.
111. Weak Correlation
When the points are more dispersed but still form a rough line.
112. Perfect Correlation
When the points lie exactly on the lbf.
113. Covariance
A measure of the strength of a correlation between two variables. Depends on units.
114. Correlation Coefficient
A Quantitative measure of the strength of a linear correlation. How closely the points cluster around the lbf.
115. Linear Regression
An analytical technique used to determine the relationship/model/equation between two variables.
116. Find the linear correlation
Refers to finding the equation of the lbf.
117. Perform a linear regression
Refers to the process of finding the equation of the lbf.
118. Residual
The positive or negative vertical distance between a data value and the lbf.
119. Non-Linear Regression
An analytical technique for finding the equation of the curve of best fit.
120. Cause and effect relationship
A change in x produces a change in y.
121. Common cause factor
An external variable changes both variables in the same way.
122. Reverse cause and effect relationship
The independent and dependent variables are reversed in the process of establishing causality.
123. Accidental Relationship
A correlation exists with no causal relationship. Variables are completely unrelated to each other.
124. Presumed relationship
A correlation exists with no causal relationship. Variables are related to each other, but is difficult to find a common cause factor or cause and effect relationship.
125. Extraneous variables
External variables that can influence results and the relationship. May influence independent or dependent variable.
126. Double Blind Study
Neither the participants nor the researchers know who is in which group.
127. Coefficient of Determination
A general measure of how well a specific regression curve fits the data.
 Author: hsu.kaitlyn ID: 275252 Card Set: Data Management Definitions Updated: 2014-06-11 00:18:48 Tags: Math Gr12 Folders: Description: Gr. 12 Data Management definitions Show Answers: