CIS2300_TEST1.txt

Home

Get App

Create

Population
- a collection of persons, objects or items of interest.
- Whatever the researcher is studying
parameter
- a descriptive measure of the population. Usually denoted by Greek letters
- e.g. mean(µ), population variance(σ^2), populuation standard deviation(σ)
- data from a census are parameters
sample

a portion of the whole and if taken properly, representative of the whole
statistic
- a descriptive measure of the sample. Usually denoted by Roman letters
- e.g. mean(x *bar*), sample variance (s^2), sample standard deviation(s)
- data from a sample are statistics
Descriptive Statistics
- Using data gathered on a group to describe or reach concclusions about that same group
- e.g. most athletic stats. The data is gathered from that group and conclusions are drawn about that group only. Basketball stats are about Basketball
Inferential Statistics
- gathering data from a sample and use the statistics generated to reach conlusions about the population from which the sample was taken
- sometimes referred to as inductive statistics
emprical rule
- The approximate values that lie within a given number of standard deviations from the mean of a set of data if the data are normally distributed.
- Distance from the Mean Values within Distance
- µ + 1σ 68%
- µ + 2σ 95%
- µ + 3σ 99.7%
Population Mean
- µ = (∑x)/N
- where x = actual data values
- N = # total terms
standard deviation
- square root of the variance
- σ = sqrt(σ)
- Σ = sqrt( (∑(x- µ)^2)/N)
sum of squares of x
- SSx
- The sum of the squared deviations about the mean of a set of values
variance
- average of the squared deviations about the arithmetic mean for a set of numbers
- Population Variance
- - σ^2 = (∑(x- µ)^2)/N)
deviation from the mean

x-µ
mean absolute deviation (MAD)
- the average of the absolute values of the deviations around the mean for a set of numbers
- MAD = (∑|x-µ|)/N
- where
- x-µ = actual value of a given number minus the mean
- N= Number of terms
Chebyshev's Theorem
- at least (1-1/k^2) values will fall within + k standard deviations of the mean regardless of the shape of the distribution. Assume k>1
- e.g. k=2.5, 1-1/(2.5^2) = .84. so at least .84 of all values are within µ + 2.5σ.
- or at least .84 of all values will be within 2.5 standard deviations of the mean, µ.
sample variance
- variance: s^2 = ∑(x- x(bar))^2)/(n-1)
- also
- s^2 = (∑x^2 - ((∑x)^2)/n)/n-1
- where
- x = actual value
- x(bar) = sample mean
- n = sample number
sample standard deviation
- sqrt(s^2) where s^2 =
- s^2 = (∑x^2 - ((∑x)^2)/n)/n-1
Percentiles
- measure of central tendency that divide a group of data into 100 parts
- 87.7% = 87th Percentile
- percentile location: i=(P/100)n
- where P = percentile
- i= percentile location
- n= number in the db
- if i is a whole number then then P = (i+(i+1))/2 or the average of the two numbers
- if i is NOT a whole number then P = whole number of i+1
- e.g. i= 11.8, P = (11.8+1) = 12.8 or 12th percentile
- e.g. i = 11, P = (11+12)/2 = 11.75 = 11th percentile
frequency distribution
- a cumming of data presented in teh form of class intervals and frequency
- e.g. 1 under 3, 3 under 5, etc.
- use classes rule of thumb, 5-15 classes
range

difference between the largest and smallest values of an order
classes
- 5-15 rule of thumb
- arrangement of values in groups
cumulative frequency

running total of frequency through the classes of a frequency distribution
relative frequency

proportion of total frequency that is in any given class interval in a frequency distribution
class width

range/# classes
histogram

typical vertical bar-chart used to depict a freq. dist.
frequency polygon

graph in which line segments connnect the dots depicting frequency distribution
ogive

cumulative frequency polygon- most useful for running totals
pie chart
- data represented as a whole
- Interval/total * 360
stem & leaf

constructed by separrating the digits for each # of the data into 2 groups
pareto
- Vertical bar chart that displays the most common types of defects
- ranked in order of occurence left to right
scatter plot
- 2 dimensional plot of pairs of points from 2 variables
- god for attempting to determine relationship between 2 variables
census
- gather data from a whole population
- data from a census are parameters
Levels of Data
- Lowest to Highest
- Nominal
- Ordinal
- Interval
- Ratio
Nominal
- Lowest level of data: Used only to classify or categorize
- e.g. doctor, lawyer, educator, other
- NON-METRIC Data, aka qualitative data.
Ordinal
- Higher than Nominal, can be used to rank or order subjects
- e.g. not helpful, somewhat helpful, moderately helpful, very helpful, extremely helpful
- NON-METRIC Data, aka qualitative data.
Interval
- Higher than Ordinal
- Distances between consecutive numbers have meaning and the data are always numerical
- e.g. temperature
Ratio
- Highest Level of data measurement
- Have the same properties off Interval but they have an absolute zero which indicates absence
- Ratio of two numbers is meaningful
- e.g. Height, weight, Kelvin temperature, passenger miles
Parametric Stats

Must be Interval or Ratio
Non-Parametric Stats

Can be nominal or ordinal but can be used to analyze parmetric
grouped data

data that have been organized into a frequency distribution
ungrouped data

raw data or data that have not been summarized in any way
median
- middle value in an ordered array of #s.
- -an array with an odd amount of values, the median is the middle value
- -an array with an even amount of values the median is the average between the two middle numbers
- -the median number is (n+1)/2
- e.g. for 77 terms the median is (77+1)/2= 39th term
Quartiles
- same rules as percentiles, if i is a whole number Qx is the average of the i+(i+1) number
- Q25 = Q1 = first 25% of values ending in the Q25 term
- Q50 = Q2 = first 50% of values ending in the Q50 term
- Q75 = Q3 = first 75% of values ending in the Q75 term
- Q2 is the median
measure of central tendency

yield info about the center, or midddle part, of a group of values
mode
- the most frequently occuring value in a set of data
- bimodal- data set has two modes
- multimodal - data set has more than two modes
Inter Quartile Range : IQR
- The middle 50% of values
- IQR = Q3-Q1
- e.g. if Q3 = the 12th (70)term and Q1 = the 4th term (5) IQR = 70-5 or 65
Coefficient of Variation
- The ratio of the standard deviation to the mean expresed in precentage and is denoted as CV
- CV = (σ/µ)100
- e.g. for σ=4.84 & µ = 64.4, CV = 7.5%
z score
- number of standard deviations a value (x) is above or below the mean of a set of numberrs when the data are normally distributed
- z = (x-µ)/σ
- e.g. x = 1, µ = 4.28, σ = 2.491, z = -1.32
- x = 9, µ = 9, σ = 2.491, z = 1.89
- z scores still follow the empirical rule
coefficient of correlation
- correlation: measure of the degree of relatedness of variables
- coefficient of correlation = r
- r = (big equation)
classical method of assigning probability
- involves an experiment which is a process that produces outcomes, and an event, which is an outcome of an experiment.
- P(E) = n_e/N
- Highest probability of an outcome is 1.
- Lowest probability is 0
apriori

probabilities can be determined prior to the experiment
intersection
- contains the element common to both sets
- X = 1234 Y = 2367 X(int)Y = 23
mutually exclusive events
- when the occurence of one event precludes the occurence of another event
- e.g. Male and Female. OK and Defective. A person can not be both Male and Female and a part may not be both OK and Defective
- formula: P(X(int)Y) = 0
independent events
- events wherein the occurence or nonoccurence of one of the events does not affect the occurence or nonoccurence of the other event.
- e.g. Coin tosses or Die Rolls. The previous event does not influence the following event
- formula: Independent Events X & Y
- P(X|Y) = P(X) and P(Y|X) = P(Y)
complement
- All the elementary events of an experiment not in A comprise its complement.
- e.g. If the experiment is rolling die and the event is 5, then the complement is 1,2,3,4,6
- A'
- P(A') = 1 - P(A)
relative frequency of occurence method of assigning probabilities

the probability of an event occurring is equal to the number of times the event has occurred in the past divided by the total number of opportunities for the event to have occurred.
subjective probability

assigning probability based on the feelings or insights of the person determining the probability
mn counting rule
- For an operation of m ways and a second operation of n ways, the tw operations then can occur, in order, in mn ways.
- This rule can be extended to 3 or more operations
- e.g. # of Groups possible with the following factors
- gender, marital status, economic class = 2(m/f), 3(single-never married/married, divorced), 3(lower/middle/upper)
- =18 groups. Therefore 18 samples could be taken to represent all groups.
sampling from a population with Replacement
- sampling n items from a population of size N with replacement would provide N^n possibilities
- e.g. A die being rolled 3 times in succession, how many different outcomes can occur?
- N = 6, n=3, 6^3 = 216
- A lottery of reusable numbers 6 digits long from 0-9
- N=10, n=6, 10^6 = 1000000
Combinations
- Sampling n items from a population of size N without replacement
- N^Csub_n = N!\{n!(N-n)!}
- e.g. three lawyers are to be sent to a conference from a pool of 16
- 16!/3!13!= 560
- combination because once selected the lawyer can not be selected again
Spearman Rank
- r(sub_s) = 1 -(6(sum)d^2/n(n^2-1))
- where d = differenc in the ranks of each pair
- n= number os pairs
- High positive number indicates a positive correlation
- High negative number indicates a negative correlation
- e.g. if x and y pairs, and spearman's equals -.830 this indicates a strong inverse correlation,
- that is when x is high y is low and vice-versa
General Law of Addition
- P(A∪B)= P(A) + P(B) - P(A∩B)
- That is probability of A + probability of B - Probability of A&B together
Special Law of Addition
- Applies only if Probabilities are mutually exclusive
- i.e. male or female, or P(A∩B) = .000
- Then the union of P(A) and P(B) = P(A) + P(B)
General Law of Multiplication
- This gives the probability that both A & B will happen at the same time
- P(A∩B) = P(A) * P(B|A) = P(B) * P(A|B)
- P(A∩B) means that A & B MUST happen.
- P(A|B) is the probability of A given that B is true
Special Law of Multiplication

If X, Y are independent, P(X∩Y) = P(X) * P(Y)
Independent Events X,Y
- To test to determine if X & Y are independent events, the following must be true
- P(X|Y) = P(X) and P(Y|X) = P(Y)
Conditional Probability

P(X|Y) = P(X∩Y)/P(Y) = (P(X)*P(X|Y))/P(Y)

Author

gummibear

7317

Card Set

CIS2300_TEST1.txt

Description

CIS 2300 Business statistics test 1

Updated

2/20/2010, 6:05:47 AM

Show Answers