statistic 1

profileFredablyee
StatiticsKeyConceptsandReview.pdf

Key Concepts and Review

Review of Basic Statistics

Module 1

• Classifying Data

o Qualitative Data

▪ Consists of names or labels, categorical data

o Quantitative Data

▪ Consists of numeric values, includes units of measurement

• Discrete

o Only takes on countable values

• Continuous

o Can take on any value within an interval

• Levels of Measure

o Nominal

▪ Categories only, data cannot be arranged in order

o Ordinal

▪ Data can be arranged in order, but differences either can’t be found or

are meaningless

o Interval

▪ Differences are meaningful but there is no natural zero starting point

and ratios are meaningless

o Ratio

▪ There is a natural zero starting point and ratios make sense

• Measures of Center

o Mean

▪ The arithmetic average of all data points

▪ In Excel =AVERAGE(data)

o Median

▪ The middle of an ordered data set

▪ In Excel =MEDIAN(data)

o Mode

▪ The most frequently appearing data

▪ In Excel =MODE.MULT(data). To get multiple outputs, highlight

several cells in a single column, type the formula in the formula bar,

then press Ctrl+Shift+Enter at the same time.

• Measures of Variation

o Range

▪ The distance between the lowest and highest value

▪ In Excel, =MAX(data) – MIN(data)

o Variance

▪ Equal to the square of the standard deviation

▪ =VAR.S(data)

o Standard Deviation

▪ A measurement of how far the data deviates from the mean

▪ =STDEV.S(data)

Module 2

Normal Distributions

o A bell curve distribution whose shape is determined by a mean and a standard

deviation

o Standard Normal Deviations and 𝑧-scores

▪ When the mean is 0 and the standard deviation is 1 it is called a

standard normal distribution. Measurements on this scale are identified

by the variable 𝑧

o Find a 𝑧-score

▪ Find a z-score from a probability with =NORM.S.INV(probability)

o Area and probability

▪ Find a probability from a z-score with =NORM.S.DIST(z, TRUE)

Module 3

• Parameters and Statistics

o A parameter is a measurement (usually a proportion, a mean, or a standard

deviation) of an entire population. Usually these values are either impossible

or unrealistic to find.

o A statistic is a measurement used to estimate a parameter that is based on a

sample taken from a population.

Measurement Parameter Statistic

Mean (average) Mu (𝜇) x-bar (�̅�)

Proportion 𝑝 p-bar (�̅�)

Standard Deviation Sigma (𝜎) 𝑠

Variance Sigma squared (𝜎2) 𝑠2

Confidence Intervals

o A confidence interval is a range of values used to estimate a population

parameter. They are made by using a point estimate obtained from a sample

and then calculating a margin of error about that point estimate.

o The point estimate is a single value used to estimate a population parameter.

Each parameter type has its own best point estimate statistic.

o Margins of error are computed differently depending on the population

parameter being estimated.

o A confidence level (1 − 𝛼) is the probability that the results of our construction

of the confidence interval will contain the population parameter.

• Constructing Confidence Intervals for a Population Proportion 𝒑.

o Best point estimate:

�̂� = 𝑥

𝑛

o Critical Value: 𝑧𝛼 2⁄

▪ =NORM.S.INV(1-alpha/2, TRUE)

o Margin of Error:

𝐸 = 𝑧𝛼 2⁄ √ �̂��̂�

𝑛

o Confidence Interval: �̂� − 𝐸 < 𝑝 < �̂� + 𝐸

• Constructing Confidence Intervals for a Population Mean 𝝁.

o Best Point Estimate: �̅�

o Critical Value: 𝑡𝛼 2⁄

▪ =T.INV(1-alpha/2, n-1, TRUE)

o Margin of Error:

𝐸 = 𝑡𝛼 2⁄

𝑠

√𝑛

o Confidence Interval: �̅� − 𝐸 < 𝜇 < �̅� + 𝐸

• Constructing Confidence Intervals for a Population Standard Deviation 𝝈.

o Best Point Estimate: 𝑠

o Left Critical Value: 𝜒𝐿 2 (Chi-squared left)

▪ =CHISQ.INV(alpha/2, n-1, TRUE)

o Right Critical Value: 𝜒𝑅 2 (Chi-squared right)

▪ =CHISQ.INV.RT(alpha/2, n-1, TRUE)

o Confidence Interval:

√ (𝑛 − 1)𝑠2

𝜒𝑅 2 < 𝜎 < √

(𝑛 − 1)𝑠2

𝜒𝐿 2

Module 4

Hypothesis Test Overview

• Steps of Hypothesis Test

o Step 1: Identify the claim to be tested and present it in symbolic form.

o Step 2: Write the null and alternative hypotheses.

o Step 3: Determine the significance level.

o Step 4: Identify the test statistic and the direction of the test.

o Step 5: Determine the decision criteria (Critical value method or P-Value

method)

▪ Critical Value Method: If the test statistic is more extreme than the

critical value(s), reject the null hypothesis.

▪ P-Value Method: If the P-value is less than or equal to 𝛼, reject the

null hypothesis.

o Step 6: Find the test statistic, then:

▪ Find the P-Value OR

▪ Find the critical value

o Step 7: Make a decision on the null hypothesis (reject or do not reject)

o Step 8: State a conclusion about the original claim (the evidence does

support the claim or the evidence does not support the claim)

• Null Hypothesis

o ALWAYS uses an equal sign

• Alternative Hypothesis

o Claim uses ‘equal to’ type language

▪ 𝐻0 uses an equal sign

▪ 𝐻𝑎 uses a not equal to sign

▪ Two-tailed test

▪ Decision Criteria

▪ If the test statistic is less than the negative critical value or

greater than the positive critical value reject 𝐻0 and conclude

that the evidence does not support the claim.

▪ If the test statistic is between the two critical values do not

reject 𝐻0 and conclude that the evidence supports the claim.

o Claim uses ‘not equal to’ type language

▪ 𝐻0 uses an equal sign

▪ 𝐻𝑎 uses a not equal to sign

▪ Two-tailed test

▪ Decision Criteria

▪ If the test statistic is less than the negative critical value or

greater than the positive critical value reject 𝐻0 and conclude

that the evidence supports the claim.

▪ If the test statistic is between the two critical values do not

reject 𝐻0 and conclude that the evidence does not support

the claim.

o Claim uses ‘less than’ type language

▪ 𝐻0 uses an equal sign

▪ 𝐻𝑎 uses a less than sign

▪ Left-tailed test

▪ Decision Criteria

▪ If the test statistic is less than the critical value reject 𝐻0 and

conclude that the evidence supports the claim.

▪ If the test statistic is greater than the critical value do not

reject 𝐻0 and conclude that the evidence does not support

the claim

o Claim uses ‘greater than’

▪ 𝐻0 uses an equal sign

▪ 𝐻𝑎 uses a greater than sign

▪ Right-tailed test

▪ Decision Criteria

▪ If the test statistic is greater than the critical value then reject

𝐻0 and conclude that the evidence supports the claim

▪ If the test statistic is less than the critical value do not reject

𝐻0 and conclude that the evidence does not support the

claim

• Critical Value Excel Formulas

o Left-tailed:

▪ 𝑡 statistic: =T.INV(alpha, d.o.f.)

▪ 𝑧 statistic: =NORM.S.INV(alpha)

▪ 𝜒2 statistic: =CHISQ.INV(alpha, d.o.f.)

o Right-tailed:

▪ 𝑡 statistic: =T.INV(1-alpha, d.o.f.)

▪ 𝑧 statistic: =NORM.S.INV(1-alpha)

▪ 𝜒2 statistic: =CHISQ.INV.RT(alpha, d.o.f.)

o Two-tailed:

▪ 𝑡 statistic: =T.INV(1-alpha/2, d.o.f.) and then uses both positive and

negative values

▪ 𝑧 statistic: =NORM.S.INV(1-alpha/2) and then uses both positive

and negative values

▪ 𝜒2 statistic: both =CHISQ.INV(alpha/2, d.o.f.) and

=CHISQ.INV.RT(alpha/2, d.o.f.)

• P-Value Excel Formulas

o Left-tailed:

▪ 𝑡 statistic: =T.DIST(test stat, d.o.f., TRUE)

▪ 𝑧 statistic: =NORM.S.DIST(test stat, TRUE)

▪ 𝜒2 statistic: =CHISQ.DIST(test stat, d.o.f., TRUE)

o Right-tailed:

▪ 𝑡 statistic: =1-T.DIST(test stat, d.o.f., TRUE)

▪ 𝑧 statistic: =1-NORM.S.DIST(test stat, TRUE)

▪ 𝜒2 statistic: =CHISQ.DIST.RT(test stat, d.o.f., TRUE)

o Two-tailed: use left-tailed formulas shown above for a negative test statistic

or right-tailed formula for a positive test statistic

• Test Statistic and Critical Value Types

o The test statistic will always be the same type of statistic as the critical

value. Which one you use depends on the parameter being tested.

▪ Claims about a mean will use a 𝑡 statistic and the T. formulas for

critical values and P-Values.

▪ Claims about a proportion will use a 𝑧 statistic and the NORM.S

formulas for critical values and P-Values.

▪ Claims about a standard deviation will use a 𝜒2 statistic and the

CHISQ. Formulas for critical values and P-Values.

Test Statistic Formulas

• Tests about a Population Proportion

o Use a 𝑧 test statistic and a 𝑧 critical value(s)

o 𝑧 − 𝑡𝑒𝑠𝑡 = 𝑝−𝑝

√ 𝑝𝑞

𝑛

▪ 𝑝 is determined by 𝐻0

• Tests about a Population Mean

o Use a 𝑡 test statistic and a 𝑡 critical value(s)

o 𝑡 − 𝑡𝑒𝑠𝑡 = �̅�−𝜇

𝑠

√𝑛

▪ 𝜇 is determined by 𝐻0

• Tests about a Population Standard Deviation

o Use a 𝜒2 (chi-squared) test statistic and a 𝜒2 critical value(s)

o 𝜒2 = (𝑛−1)𝑠2

𝜎2

▪ 𝜎 is determined by 𝐻0

Module 5

• Comparing Two Population Proportions

o Use a 𝑧 test statistic and a 𝑧 critical value(s)

o Test statistic:

𝑧 = (�̂�1 − �̂�2) − (𝑝1 − 𝑝2)

√ �̅��̅� 𝑛1

+ �̅��̅� 𝑛2

▪ �̂�1 = 𝑥1

𝑛1 , sample proportion from 1st sample

▪ �̂�2 = 𝑥2

𝑛2 , sample proportion from 2nd sample

▪ 𝑝1 – population proportion from 1st population

▪ 𝑝2 – population proportion from 2nd population

▪ From 𝐻0, assume 𝑝1 = 𝑝2, i.e. 𝑝1 − 𝑝2 = 0

▪ �̅� = 𝑥1+𝑥2

𝑛1+𝑛2 , pooled proportion

▪ �̅� = 1 − �̅�

▪ 𝑛1 – sample size from 1st population

▪ 𝑛2 – sample size from 2nd population

• Comparing Two Means, Independent Samples

o Use a 𝑡 test statistic and a 𝑡 critical value(s)

o Test statistic:

𝑡 = (�̅�1 − �̅�2) − (𝜇1 − 𝜇2)

√ 𝑠1

2

𝑛1 +

𝑠2 2

𝑛2

▪ �̅�1 – sample mean from 1st sample

▪ �̅�2 – sample mean from 2nd sample

▪ 𝜇1 – population mean from 1st population

▪ 𝜇2 – population mean from 2nd sample

▪ From 𝐻0, assume 𝜇1 = 𝜇2, i.e. 𝜇1 − 𝜇2 = 0

▪ 𝑠1 – sample standard deviation from 1st sample

▪ 𝑠2 – sample standard deviation from 2nd sample

▪ 𝑛1 – sample size from 1st population

▪ 𝑛2 – sample size from 2nd population

• Comparing Dependent Samples, Matched Pairs

o Use a 𝑡 test statistic and a 𝑡 critical value(s)

o Test statistic:

𝑡 − 𝑡𝑒𝑠𝑡 = �̅� − 𝜇𝑑

𝑠𝑑

√𝑛

▪ 𝑑 – difference between two matched pairs

▪ �̅� – average difference between matched pairs in sample

▪ 𝜇𝑑 – average difference between matched pairs in population,

determined by 𝐻0

▪ 𝑠𝑑 – standard deviation of the differences of matched pairs

▪ 𝑛 – number of pairs

Module 6

Linear Correlation and Regression Analysis

• Linear Correlation o A correlation exists is the values of one variable are somehow associated

with the values of the other variable ▪ Correlation does not imply causation

• Correlation Coefficient

o We use the variable 𝑟 to measure the strength of linear correlation. ▪ In Excel =CORREL(x data, y data)

o Whenever |𝑟| < 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑉𝑎𝑙𝑢𝑒 we conclude that there is no significant linear correlation.

▪ Critical value is determined by the significance level and the sample size. See Appendix A in textbook for table of critical values of correlation coefficients.

o If 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑉𝑎𝑙𝑢𝑒 ≤ 𝑟 ≤ 1 we say there is a positive linear correlation. o If −1 ≤ 𝑟 ≤ 𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑉𝑎𝑙𝑢𝑒 we say there is a negative linear correlation.

• Coefficient of determination

o The proportion of variance in the dependent variable that is predictable from the independent variable.

𝑟2 = 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛

𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 ▪ The coefficient of determination is also simply the square of

the correlation coefficient.

• Regression Equation

o �̂�̂ = 𝑏0 + 𝑏1𝑥 ▪ 𝑥 – independent variable ▪ �̂�̂ – best predicted value ▪ 𝑏0 – y – intercept

▪ In Excel =INTERCEPT(y data, x data) ▪ 𝑏1 – slope

▪ In Excel, =SLOPE(y data, x data)

• Outliers and Influential Points

o An outlier is a point that is far away from other data points. o An influential point is a point that strongly affects the regression line.