Financial Modelling: Revision of Basic Statistical Concepts
Home > Flashcards > Print Preview
The flashcards below were created by user
jordan_hs
on
FreezingBlue Flashcards. What would you like to do?

Data analysis
• Data analysis is the process used to get from raw data to the results that can be used to make decisions.
 • Results of data analysis can be used for:
 • Detecting trends
 • Making predictions

Descriptive statistics
• Descriptive Statistics answer basic questions about the central tendency and dispersion of data observations.
 • Measures of central tendency
 • Measures of dispersion (variability)
 • Measures of shape

Measures of central tendency
 • Quantification of the location of the middle or centre of a data set.
 • What the typical or average score/ result of a dataset is.
 Measures of central tendency:
 • Mean
 • Mode
 • Median

Measures of dispersion
 • Measures of dispersion: statistical measures that summarise the amount of spread or variation in the distribution of values in a variable.
 • How values are spread within a distribution.
 Measures of dispersion:
 • Range
 • Standard deviation
 • Variance

Variance
Measure of the spread

Standard Deviation
Shows relation that a set of data has to the mean of the sample data

IQR
 ·Looks at data in terms of quarters
 ·Range divided in four quarters

IQR excel
QUARTILE(array, quart)
 Quart
 0 = Minimum Value
 1 = First Quartile
 2 = Median Value
 3 = Third Quartile
 4 = Max value

Variance
 • Where the mean is a measure of the centre of a group of numbers, the variance is the measure of the spread.
 • It involves measuring the distance between each of the values and the mean
 • The larger the variance value the further the observed values of the data set are dispersed from the mean.
 • A variance value of zero means all observed values are the same as the mean.

Standard deviation
 • Standard deviation: how far on average each value is from the mean.
 • Problem with variance: because the differences are squared, the units of variance are not the same as the units of the data.
 • This can make interpretation of the results problematic.
 • If the variance is square rooted, the units of variance then correspond to those of the data set.
 • This square rooting of the variance is reported as the standard deviation.

STDEV.S vs STDEV.P
• In order to know which function you need to use, you must know if you’re working with statistics (samples) or parameters(population data).
• The difference between a statistic and a parameter is that statistics describe a sample. A parameter describes an entirepopulation.

Measures of shape
Skewness and Kurtosis
A fundamental task in many statistical analyses is to characterize the location and variability of a data set(Measures of central tendency vs. measures of dispersion)
 • Skewness measures departure from symmetry and is usually characterized as being left or right skewed.
 • Kurtosis measures “peakedness” of a distribution and comes in two forms, platykurtosis and leptokurtosis.

The Normal Distribution
 MEAN = MEDIAN = MODE
 • Location is determined by the mean, μ
 • Spread is determined by the standard deviation, σ
 • The random variable has an infinite theoretical range: "+ infinity" to " infinity"

68‐95‐99.7 Rule
 • 68.27% of the values lie within one standard deviation of the mean.
 • 95.45% of the values lie within two standard deviations of the mean.
 • Nearly all (99.73%) of the values lie within three standard deviations of the mean.

Skewness
 • A distribution in which the values equidistant from the mean have equal frequencies and is called Symmetric Distribution.
 • Any departure from symmetry is called skewness.

Skewness part 2
• In a perfectly symmetric distribution,Mean=Median=Mode and the two tails of the distribution are equal in length from the mean.
• If right tail is longer than the left tail then the distribution is said to have positive skewness.
• If left tail is longer than the right tail then the distribution issaid to have negative skewness.

Measures of Skewness
• The coefficient of Skewness is a measure for the degree of symmetry in the variable distribution.
 • Mean < median: negative, or left‐skewed
 • Mean = median: symmetric, or zero skewness
 • Mean > median: positive, or right‐skewed

Kurtosis
• The coefficient of Kurtosis is a measure for the degree of peakedness/flatness in the variable distribution.
• When the peak of a curve becomes relatively high then that curve is called Leptokurtic.
• When the curve is flat‐topped, then it is called Platykurtic.

Further moments of the distribution
 Statistics that describe the shape of the distribution:
 • 1st moment ‐ Mean (describes central value)
 • 2nd moment ‐ Variance (describes dispersion)
 • 3rd moment ‐ Skewness (describes asymmetry)
 • 4th moment ‐ Kurtosis (describes peakedness)

Hypothesis testing
Goal: make statement(s) regarding unknown population parameter values based on sample data
 Elements
 1) Null hypothesis
 2) Alternative hypothesis
 3) Test statistic
 4) Rejection region

Null Hypothesis
 • States the assumption (numerical) to be tested
 • Is always about a population parameter
 • Begin with the assumption that the null hypothesis is true (innocent until proven guilty)
 • Refers to the status quo
 • Always contains “=” , “≤” or “” sign
 • May or may not be rejected

The alternative hypothesis, H1
 • Opposite of the null hypothesis
 • Challenges the status quo
 • Never contains the “=” , “≤” or “” sign
 • May or may not be supported
 • Generally the alternative hypothesis is the one thatthe researcher is trying to support

Hypothesis testing process
1. There are two hypotheses, the null and the alternative hypotheses.
2. The procedure begins with the
assumption that the null hypothesis is true.
3.
The goal is to determine whether there is enough evidence to infer that the alternative hypothesis is true, or the null is not likely to be true.
 4. There are two possible decisions:
 • Conclude that there is enough evidence to support the alternative hypothesis. Reject the null.
 • Conclude that there is not enough evidence to support the alternative hypothesis. Fail to reject the null.

t‐test of hypothesis for the mean
• t‐test used to test hypotheses about means (μ) when the population variance is unknown (the usual case). It can be used under the assumption that sampled distribution is normal.
 • Use
 • Test for value of a single mean
 • E.g., test to see if a single group of subjects differs from a known value
 • Test for equality of two means


Level of significance,
 • Defines the unlikely values of the sample statistic if the null hypothesis is true
 • Defines rejection region of the sampling distribution
 • Is designated by , (level of significance)
 • Typical values are .01, .05, or .10
• Is selected by the researcher at the beginning
• Provides the critical value(s) of the test

Histogram
Similarities between normal distribution and its histogram
 • There is a single highest bar (the mode)
 • There are as many values above the mode as there are below it (it is in the middle)
 • The shape of the histogram is symmetrical about the mode,so the left side is a mirror image of the right
 • The frequency of values gets lower as you move further from the mode in a way that produces a bell shape

Histograms
 Using the Frequency function
1. Calculate the maximum and minimum return
2. Divide the difference between the maximum and minimum up into intervals of “Bins”
3. Use the Frequency function to allocate thereturns to a Bin

Histograms formula
 = FREQUENCY (A2 : A200, C2 : C20)
 a is the data array
 b is the bins array
 1. Enter the formula once
 2. Select the range of cells you want the formulae to apply to
 3. Click Ctrl+Shift+Enter

Interpreting the p‐value…
• The smaller the p‐value, the more statistical evidence exists to support the alternative hypothesis.
 • P‐value < 1%, there is overwhelming evidence that supports the alternative hypothesis.
 • P‐value 1%5%, there is a strong evidence that supports the alternative hypothesis.
 • P‐value 5%10% there is a weak evidence that supports the alternative hypothesis.
 • P‐value >10%, there is no evidence that supports the alternative hypothesis.