assignment 2

crisalza09
2.lecture2.docx

Lecture Two

· When to use what:

· Univariate description: when we describe the distribution of a single variable, we use various numerical measures to characterize the center and dispersion of that distribution.

· Measures of central tendency:

· Mode (frequency-based): value(s) for the most frequently observed categor(ies).

· Can be unimodal, bimodal, or even multimodal.

· Median (rank-order-based): the value of the middle one (or the average of the middle two if the number of cases is an even number) in a sorted distribution.

· Less sensitive to skewness (or outliers).

· Mean (value-based): sum of all values divided by the number of cases.

· Sensitive to skewness (or outliers).

· In a histogram:

· From bins to density curve. [draw on board]

· Height of the curve = relative frequency; Area under the curve=1 (or 100%).

· Where are the mode, median, and mean (all on X-axis).

· Symmetric distribution: mode = median = mean.

· Positive skew: median < mean.

· Negative skew: median > mean.

· Measures of variability/spread/dispersion:

· Applicable to continuous variables only.

· Rank-order-based measures (i.e., you need to sort the distribution first):

· Introducing percentiles:

· If the Pth percentile is the value X, then P% of the observations in the distribution fall below X. For example, if the 95th percentile of exam grades of SYA 4450 is 86, then 95% of the students in this class have a lower score than 86.

· Graphically, division of the area under the curve. [draw on board]

· Min – 25th quartiles (Q1) – 50th quartile (Median, or Q2) – 75th quartile (Q3) – max. Usually identifiable in a frequency table.

· Percentiles are also called quantiles. In addition to quartiles, you will also see centiles

· Range = max – min

· Very sensitive to extreme values at the ends.

· Interquartile Range (IQR) = Q3 – Q1

· Not sensitive to extreme values beyond the quartiles.

· Semi-interquartile range: IQR/2

· Value-based measures:

· Population data:

·

Variance (Var):

·

Standard Deviation (SD):

· Sample data:

·

Variance (Var):

·

Standard Deviation (SD):

· Degrees of Freedom (DF, df)

· Given that: sample size = n, and the sample mean is known;

· DF = n-1

· Notations:

Population

Sample

Number of observations

N

n

Mean

Deviation (D)

<