assignment 2
Lecture Two
· When to use what:
· Univariate description: when we describe the distribution of a single variable, we use various numerical measures to characterize the center and dispersion of that distribution.
· Measures of central tendency:
· Mode (frequency-based): value(s) for the most frequently observed categor(ies).
· Can be unimodal, bimodal, or even multimodal.
· Median (rank-order-based): the value of the middle one (or the average of the middle two if the number of cases is an even number) in a sorted distribution.
· Less sensitive to skewness (or outliers).
· Mean (value-based): sum of all values divided by the number of cases.
· Sensitive to skewness (or outliers).
· In a histogram:
· From bins to density curve. [draw on board]
· Height of the curve = relative frequency; Area under the curve=1 (or 100%).
· Where are the mode, median, and mean (all on X-axis).
· Symmetric distribution: mode = median = mean.
· Positive skew: median < mean.
· Negative skew: median > mean.
· Measures of variability/spread/dispersion:
· Applicable to continuous variables only.
· Rank-order-based measures (i.e., you need to sort the distribution first):
· Introducing percentiles:
· If the Pth percentile is the value X, then P% of the observations in the distribution fall below X. For example, if the 95th percentile of exam grades of SYA 4450 is 86, then 95% of the students in this class have a lower score than 86.
· Graphically, division of the area under the curve. [draw on board]
· Min – 25th quartiles (Q1) – 50th quartile (Median, or Q2) – 75th quartile (Q3) – max. Usually identifiable in a frequency table.
· Percentiles are also called quantiles. In addition to quartiles, you will also see centiles
· Range = max – min
· Very sensitive to extreme values at the ends.
· Interquartile Range (IQR) = Q3 – Q1
· Not sensitive to extreme values beyond the quartiles.
· Semi-interquartile range: IQR/2
· Value-based measures:
· Population data:
·
·
· Sample data:
·
·
· Degrees of Freedom (DF, df)
· Given that: sample size = n, and the sample mean is known;
· DF = n-1
· Notations:
|
|
Population |
Sample |
|
Number of observations |
N |
n |
|
Mean |
|
|
|
Deviation (D) |
|
< |