The totality of methods we employ to collect and analyze data
study of how data can be summarized effectively to describe the important aspects of large data sets.
Making forecasts, estimates, or judgements about a larger group from the smaller group actually observed.
All members of a specified group.
Descriptive measure of a population characteristic (mean value, range of investment returns, variance)
Subset of a population
Descriptive measure of a sample characteristic - quantity computer from or used to describe a sample.
Weakest level of measurement. Categorize data but don't rank it. (i.e. 1= small cap fund, 2 = large cap fund) - no rank as to what is bigger, best, etc.
Sort data into categories that are ordered base on some characteristic. Mutual funds ranked 1 star to 4 stars is ordinal scale (4 stars = better than 1 star). However, ordinal scales do not provide any detail on how much better 4 is than 1.
Provide ranking and assurance that differences between scale values are equal (allows for subtracting and adding scale values) - Celsius/Fahrenheit scales = interval scales.
Zero point does not represent absolute zero (i.e. zero degrees <> no temperature) - because of this, cannot form ratios on interval scale (i.e. 50 Celsius <> 5*10 degrees Celsius)
Strongest measurement scale. Have all interval characteristics + absolute zero which allows for ratios and adding/subtracting amounts within scale. I.e. zero dollars = no money. Also, twice as much money = twice as much purchasing power.
Tabular display of data summarized into a small number of intervals:
Holding Period Return
= price per share at the end of period t
= price per share at time period immediately preceding time period t
= cash distributions received during period t (dividends/coupon)
Capital gain (or loss) plus distributions divided by the beginning-period price.
Holding Period Return Characteristics (2):
1. Time element - rate of return follows the interval (i.e. monthly time interval = monthly rate of return)
2. rate of return has no currency unit attached to it (any currency works)
Creation of Frequency Distribution (7 steps)
1. Sort data in ascending order
2. Calculate range of data (Max value - Min value)
3. Decide on number of intervals (k)
4. Determine interval width (range/k)
5.Determine intervals by adding the interval width to the minimum value - stop after reaching an interval that includes the max value.
6. Count # of observations falling into each interval
7. Construct table of intervals listing smallest to largest that shows # of observations in each.
Set of values within which an observation falls. Each observation falls into only one interval. Total # of intervals covers all data.
Absolute frequency (frequency)
Actual number of observations in a given interval.
Frequency Distribution Example:
12 observations sorted in ascending order:
Absolute frequency of each interval divided by the total # of observations.
Cumulative Relative Frequency
Adds up all relative frequencies as we move from the first to last interval. Tells the fraction of observations that are less than the upper limit of each interval. (i.e. if there are 1 observations in first interval and 2 in second interval and 30 overall the cumulative would be 3/30)
Bar chart of data that has been grouped into a frequency distribution.
Height of each bar represents the absolute frequency for each return interval.
Plot the midpoint of each interval on the x-axis and the absolute frequency for that interval on the y-axis; connect points with a straight line.
Cumulative Absolute Frequency Distribution
Sum of observations divided by # of observations.
Arithmetic mean of population.
sum of observations / # of observations
N = # of observations
= ith observation
This is a parameter of the population.
n = number of observations in the sample.
Sum of observations divided by # of observations.
Data that is some units at a specific point in time (i.e. ROE for S&P 500)
Sample that consists of historical monthly data (i.e. monthly returns on S&P 500)
Sum of deviations around the mean =
Deviations around the mean =
Middle item of set of items that has been sorted into ascending/descending order.
Even # sample, median is the average of the two middle numbers.
Odd # sample - median = exact middle number
Not affected by extreme values like the mean
Pros/Cons of the mean vs. median and mode
1. Pro - Mean uses all of the information about the size and magnitude of the observations
2. Pro - Mean is easy to work with mathematically.
3. Con - Sensitive to extreme values
Most frequently occurring value.
only measure of central tendency that can be used with nominal data.
One mode = unimodal
Two modes = bimodal
Three modes = trimodal
No Mode is possible as well.
Sum of the weights = 1
Multiply each observation by the weight and sum the values.
Weight in Context of Portfolios (Positive vs. Negative)
Positive Weight = asset held long
Negative Weight = asset held short
Portfolio Return =
Weighted sum. Weighted mean better than arithmetic mean. Portfolio's return is weighted average of the returns on the assets in the portfolio.
Weighted average of forward looking data
Typically used to average rates of change over time or compute the growth rate of a variable.
X = observations
n = # of observations
Observations must be >0 for this to work.
Geometric Mean for Portfolio Returns
Must make values non negative (add 1)
Also known as Compound Returns.
Take the square root based on # of observations of the product of all of the return values + 1 (return + 1). Subtract one at the end.
(Return+1)*(Return+1) - square root = 2
Geometric Mean vs. Arithmetic Mean
Geometric mean is always less than arithmetic mean unless there is no variability in observations than the two will be equal.
Difference between the two increases with more variability in the period-by-period observations.
Geometric Mean Return represents:
Growth rate or compound rate of return on an investment.
Focuses on profitability of an investment over a multi-period horizon.
Sum reciprocals of the observations (1/X) and average that sum by dividing it by the # of observations and finally, take the reciprocal of that average.
Special weighted mean in which observation's weight is inversely proportional to its magnitude.
Used in cost averaging where a fixed amount of money is periodically invested (i.e. 1000 a month). Ratios we would be averaging are th prices per share at different purchase dates but we are applying those to a constant amount of money to yield a variable number of shares.
Harmonic Mean vs. Arithmetic Mean vs. Geometric Mean
Harmonic mean is less than geometric mean which is less than arithmetic mean unless all observations are the same - all would be equal in this cas.
Value that lies at or below a stated fraction of the data
Quartile = divide data in quarters
Quintile = divide data in fifths
Decile = divide data in tenths
Percentile = divide data in hundredths
First Quartile (Q1)
Divides a distribution such that 25% of the observations lie at or below it.