statistic 1
Key Concepts and Review
Review of Basic Statistics
Module 1
• Classifying Data
o Qualitative Data
▪ Consists of names or labels, categorical data
o Quantitative Data
▪ Consists of numeric values, includes units of measurement
• Discrete
o Only takes on countable values
• Continuous
o Can take on any value within an interval
• Levels of Measure
o Nominal
▪ Categories only, data cannot be arranged in order
o Ordinal
▪ Data can be arranged in order, but differences either can’t be found or
are meaningless
o Interval
▪ Differences are meaningful but there is no natural zero starting point
and ratios are meaningless
o Ratio
▪ There is a natural zero starting point and ratios make sense
• Measures of Center
o Mean
▪ The arithmetic average of all data points
▪ In Excel =AVERAGE(data)
o Median
▪ The middle of an ordered data set
▪ In Excel =MEDIAN(data)
o Mode
▪ The most frequently appearing data
▪ In Excel =MODE.MULT(data). To get multiple outputs, highlight
several cells in a single column, type the formula in the formula bar,
then press Ctrl+Shift+Enter at the same time.
• Measures of Variation
o Range
▪ The distance between the lowest and highest value
▪ In Excel, =MAX(data) – MIN(data)
o Variance
▪ Equal to the square of the standard deviation
▪ =VAR.S(data)
o Standard Deviation
▪ A measurement of how far the data deviates from the mean
▪ =STDEV.S(data)
Module 2
Normal Distributions
o A bell curve distribution whose shape is determined by a mean and a standard
deviation
o Standard Normal Deviations and 𝑧-scores
▪ When the mean is 0 and the standard deviation is 1 it is called a
standard normal distribution. Measurements on this scale are identified
by the variable 𝑧
o Find a 𝑧-score
▪ Find a z-score from a probability with =NORM.S.INV(probability)
o Area and probability
▪ Find a probability from a z-score with =NORM.S.DIST(z, TRUE)
Module 3
• Parameters and Statistics
o A parameter is a measurement (usually a proportion, a mean, or a standard
deviation) of an entire population. Usually these values are either impossible
or unrealistic to find.
o A statistic is a measurement used to estimate a parameter that is based on a
sample taken from a population.
Measurement Parameter Statistic
Mean (average) Mu (𝜇) x-bar (�̅�)
Proportion 𝑝 p-bar (�̅�)
Standard Deviation Sigma (𝜎) 𝑠
Variance Sigma squared (𝜎2) 𝑠2
Confidence Intervals
o A confidence interval is a range of values used to estimate a population
parameter. They are made by using a point estimate obtained from a sample
and then calculating a margin of error about that point estimate.
o The point estimate is a single value used to estimate a population parameter.
Each parameter type has its own best point estimate statistic.
o Margins of error are computed differently depending on the population
parameter being estimated.
o A confidence level (1 − 𝛼) is the probability that the results of our construction
of the confidence interval will contain the population parameter.
• Constructing Confidence Intervals for a Population Proportion 𝒑.
o Best point estimate:
�̂� = 𝑥
𝑛
o Critical Value: 𝑧𝛼 2⁄
▪ =NORM.S.INV(1-alpha/2, TRUE)
o Margin of Error:
𝐸 = 𝑧𝛼 2⁄ √ �̂��̂�
𝑛
o Confidence Interval: �̂� − 𝐸 < 𝑝 < �̂� + 𝐸
• Constructing Confidence Intervals for a Population Mean 𝝁.
o Best Point Estimate: �̅�
o Critical Value: 𝑡𝛼 2⁄
▪ =T.INV(1-alpha/2, n-1, TRUE)
o Margin of Error:
𝐸 = 𝑡𝛼 2⁄
𝑠
√𝑛
o Confidence Interval: �̅� − 𝐸 < 𝜇 < �̅� + 𝐸
• Constructing Confidence Intervals for a Population Standard Deviation 𝝈.
o Best Point Estimate: 𝑠
o Left Critical Value: 𝜒𝐿 2 (Chi-squared left)
▪ =CHISQ.INV(alpha/2, n-1, TRUE)
o Right Critical Value: 𝜒𝑅 2 (Chi-squared right)
▪ =CHISQ.INV.RT(alpha/2, n-1, TRUE)
o Confidence Interval:
√ (𝑛 − 1)𝑠2
𝜒𝑅 2 < 𝜎 < √
(𝑛 − 1)𝑠2
𝜒𝐿 2
Module 4
Hypothesis Test Overview
• Steps of Hypothesis Test
o Step 1: Identify the claim to be tested and present it in symbolic form.
o Step 2: Write the null and alternative hypotheses.
o Step 3: Determine the significance level.
o Step 4: Identify the test statistic and the direction of the test.
o Step 5: Determine the decision criteria (Critical value method or P-Value
method)
▪ Critical Value Method: If the test statistic is more extreme than the
critical value(s), reject the null hypothesis.
▪ P-Value Method: If the P-value is less than or equal to 𝛼, reject the
null hypothesis.
o Step 6: Find the test statistic, then:
▪ Find the P-Value OR
▪ Find the critical value
o Step 7: Make a decision on the null hypothesis (reject or do not reject)
o Step 8: State a conclusion about the original claim (the evidence does
support the claim or the evidence does not support the claim)
• Null Hypothesis
o ALWAYS uses an equal sign
• Alternative Hypothesis
o Claim uses ‘equal to’ type language
▪ 𝐻0 uses an equal sign
▪ 𝐻𝑎 uses a not equal to sign
▪ Two-tailed test
▪ Decision Criteria
▪ If the test statistic is less than the negative critical value or
greater than the positive critical value reject 𝐻0 and conclude
that the evidence does not support the claim.
▪ If the test statistic is between the two critical values do not
reject 𝐻0 and conclude that the evidence supports the claim.
o Claim uses ‘not equal to’ type language
▪ 𝐻0 uses an equal sign
▪ 𝐻𝑎 uses a not equal to sign
▪ Two-tailed test
▪ Decision Criteria
▪ If the test statistic is less than the negative critical value or
greater than the positive critical value reject 𝐻0 and conclude
that the evidence supports the claim.
▪ If the test statistic is between the two critical values do not
reject 𝐻0 and conclude that the evidence does not support
the claim.
o Claim uses ‘less than’ type language
▪ 𝐻0 uses an equal sign
▪ 𝐻𝑎 uses a less than sign
▪ Left-tailed test
▪ Decision Criteria
▪ If the test statistic is less than the critical value reject 𝐻0 and
conclude that the evidence supports the claim.
▪ If the test statistic is greater than the critical value do not
reject 𝐻0 and conclude that the evidence does not support
the claim
o Claim uses ‘greater than’
▪ 𝐻0 uses an equal sign
▪ 𝐻𝑎 uses a greater than sign
▪ Right-tailed test
▪ Decision Criteria
▪ If the test statistic is greater than the critical value then reject
𝐻0 and conclude that the evidence supports the claim
▪ If the test statistic is less than the critical value do not reject
𝐻0 and conclude that the evidence does not support the
claim
• Critical Value Excel Formulas
o Left-tailed:
▪ 𝑡 statistic: =T.INV(alpha, d.o.f.)
▪ 𝑧 statistic: =NORM.S.INV(alpha)
▪ 𝜒2 statistic: =CHISQ.INV(alpha, d.o.f.)
o Right-tailed:
▪ 𝑡 statistic: =T.INV(1-alpha, d.o.f.)
▪ 𝑧 statistic: =NORM.S.INV(1-alpha)
▪ 𝜒2 statistic: =CHISQ.INV.RT(alpha, d.o.f.)
o Two-tailed:
▪ 𝑡 statistic: =T.INV(1-alpha/2, d.o.f.) and then uses both positive and
negative values
▪ 𝑧 statistic: =NORM.S.INV(1-alpha/2) and then uses both positive
and negative values
▪ 𝜒2 statistic: both =CHISQ.INV(alpha/2, d.o.f.) and
=CHISQ.INV.RT(alpha/2, d.o.f.)
• P-Value Excel Formulas
o Left-tailed:
▪ 𝑡 statistic: =T.DIST(test stat, d.o.f., TRUE)
▪ 𝑧 statistic: =NORM.S.DIST(test stat, TRUE)
▪ 𝜒2 statistic: =CHISQ.DIST(test stat, d.o.f., TRUE)
o Right-tailed:
▪ 𝑡 statistic: =1-T.DIST(test stat, d.o.f., TRUE)
▪ 𝑧 statistic: =1-NORM.S.DIST(test stat, TRUE)
▪ 𝜒2 statistic: =CHISQ.DIST.RT(test stat, d.o.f., TRUE)
o Two-tailed: use left-tailed formulas shown above for a negative test statistic
or right-tailed formula for a positive test statistic
• Test Statistic and Critical Value Types
o The test statistic will always be the same type of statistic as the critical
value. Which one you use depends on the parameter being tested.
▪ Claims about a mean will use a 𝑡 statistic and the T. formulas for
critical values and P-Values.
▪ Claims about a proportion will use a 𝑧 statistic and the NORM.S
formulas for critical values and P-Values.
▪ Claims about a standard deviation will use a 𝜒2 statistic and the
CHISQ. Formulas for critical values and P-Values.
Test Statistic Formulas
• Tests about a Population Proportion
o Use a 𝑧 test statistic and a 𝑧 critical value(s)
o 𝑧 − 𝑡𝑒𝑠𝑡 = 𝑝−𝑝
√ 𝑝𝑞
𝑛
▪ 𝑝 is determined by 𝐻0
• Tests about a Population Mean
o Use a 𝑡 test statistic and a 𝑡 critical value(s)
o 𝑡 − 𝑡𝑒𝑠𝑡 = �̅�−𝜇
𝑠
√𝑛
▪ 𝜇 is determined by 𝐻0
• Tests about a Population Standard Deviation
o Use a 𝜒2 (chi-squared) test statistic and a 𝜒2 critical value(s)
o 𝜒2 = (𝑛−1)𝑠2
𝜎2
▪ 𝜎 is determined by 𝐻0
Module 5
• Comparing Two Population Proportions
o Use a 𝑧 test statistic and a 𝑧 critical value(s)
o Test statistic:
𝑧 = (�̂�1 − �̂�2) − (𝑝1 − 𝑝2)
√ �̅��̅� 𝑛1
+ �̅��̅� 𝑛2
▪ �̂�1 = 𝑥1
𝑛1 , sample proportion from 1st sample
▪ �̂�2 = 𝑥2
𝑛2 , sample proportion from 2nd sample
▪ 𝑝1 – population proportion from 1st population
▪ 𝑝2 – population proportion from 2nd population
▪ From 𝐻0, assume 𝑝1 = 𝑝2, i.e. 𝑝1 − 𝑝2 = 0
▪ �̅� = 𝑥1+𝑥2
𝑛1+𝑛2 , pooled proportion
▪ �̅� = 1 − �̅�
▪ 𝑛1 – sample size from 1st population
▪ 𝑛2 – sample size from 2nd population
• Comparing Two Means, Independent Samples
o Use a 𝑡 test statistic and a 𝑡 critical value(s)
o Test statistic:
𝑡 = (�̅�1 − �̅�2) − (𝜇1 − 𝜇2)
√ 𝑠1
2
𝑛1 +
𝑠2 2
𝑛2
▪ �̅�1 – sample mean from 1st sample
▪ �̅�2 – sample mean from 2nd sample
▪ 𝜇1 – population mean from 1st population
▪ 𝜇2 – population mean from 2nd sample
▪ From 𝐻0, assume 𝜇1 = 𝜇2, i.e. 𝜇1 − 𝜇2 = 0
▪ 𝑠1 – sample standard deviation from 1st sample
▪ 𝑠2 – sample standard deviation from 2nd sample
▪ 𝑛1 – sample size from 1st population
▪ 𝑛2 – sample size from 2nd population
• Comparing Dependent Samples, Matched Pairs
o Use a 𝑡 test statistic and a 𝑡 critical value(s)
o Test statistic:
𝑡 − 𝑡𝑒𝑠𝑡 = �̅� − 𝜇𝑑
𝑠𝑑
√𝑛
▪ 𝑑 – difference between two matched pairs
▪ �̅� – average difference between matched pairs in sample
▪ 𝜇𝑑 – average difference between matched pairs in population,
determined by 𝐻0
▪ 𝑠𝑑 – standard deviation of the differences of matched pairs
▪ 𝑛 – number of pairs
Module 6
Linear Correlation and Regression Analysis
• Linear Correlation o A correlation exists is the values of one variable are somehow associated
with the values of the other variable ▪ Correlation does not imply causation
• Correlation Coefficient
o We use the variable 𝑟 to measure the strength of linear correlation. ▪ In Excel =CORREL(x data, y data)
o Whenever |𝑟| < 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑉𝑎𝑙𝑢𝑒 we conclude that there is no significant linear correlation.
▪ Critical value is determined by the significance level and the sample size. See Appendix A in textbook for table of critical values of correlation coefficients.
o If 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑉𝑎𝑙𝑢𝑒 ≤ 𝑟 ≤ 1 we say there is a positive linear correlation. o If −1 ≤ 𝑟 ≤ 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑉𝑎𝑙𝑢𝑒 we say there is a negative linear correlation.
• Coefficient of determination
o The proportion of variance in the dependent variable that is predictable from the independent variable.
𝑟2 = 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 ▪ The coefficient of determination is also simply the square of
the correlation coefficient.
• Regression Equation
o �̂�̂ = 𝑏0 + 𝑏1𝑥 ▪ 𝑥 – independent variable ▪ �̂�̂ – best predicted value ▪ 𝑏0 – y – intercept
▪ In Excel =INTERCEPT(y data, x data) ▪ 𝑏1 – slope
▪ In Excel, =SLOPE(y data, x data)
• Outliers and Influential Points
o An outlier is a point that is far away from other data points. o An influential point is a point that strongly affects the regression line.