help with crja due in 24 hours

profilecombs
Unit7.pdf

Unit 7

Hypothesis Testing: The One Sample Case

Unit Objectives:

 Explain the logic of hypothesis testing, including the concepts of the null hypothesis, the

sampling distribution, the alpha level, and the test statistic.

 Explain what it means to “reject the null hypothesis” or “fail to reject the null

hypothesis.”

 Identify and cite examples of situations in which one-sample tests of hypotheses are

appropriate.

 Test the significance of single-sample means and proportions using the five-step model,

and correctly interpret the results.

 Explain the difference between one- and two-tailed tests, and specify when each is

appropriate.

 Define and explain Type I and Type II errors, and relate each to the selection of an alpha

level.

 Use the Student’s t distribution to test the significance of a sample mean for a small

sample.

Unit Outline

 Using Statistics

 An Overview of Hypothesis Testing

 The Five-Step Model for Hypothesis Testing

 Choosing a One-Tailed or Two-Tailed Test

 Selecting an Alpha Level

 The Student’s t Distribution

 Tests of Hypotheses for Single-Sample Proportions (Large Samples)

Using Statistics

 One-sample hypothesis tests are used to answer research questions. For example:

 Are residents of a particular neighborhood more likely to use a park than all

residents of the city?

 Do student athletes consume less alcohol then all other college students?

 Does a program designed to reduce sexual risk taking among teenagers work?

 Hypothesis testing (or significance testing) is the second branch of inferential statistics.

 Hypothesis testing is used when we have a probability sample that we want to compare to

a population.

 Hypothesis testing is used when we want to know if the group represented by the sample

is different from the population on some characteristic.

 A significant finding means that the difference between the sample and the population on

a particular characteristic is very unlikely to be caused by random chance alone.

There is always a small amount of uncertainty in our conclusions. However, an advantage of

inferential statistics is that we know the probability of being wrong and can judge results

accordingly

An Overview of Hypothesis Testing

 Example:  A sociologist is assessing the effectiveness of a rehabilitation program for

alcoholics in her city. The program serves a large area, and she cannot test every

single client. Instead, she draws a random sample of 127 people from the list of

all clients and questions them on a variety of issues. She notices that, on average,

the people in her sample miss fewer days of work each year than workers in the

city as a whole. On the basis of this unexpected finding, she decides to conduct a

test to see if the workers in her sample really are more reliable than workers in

the community as a whole.

 Our research question is: Are people treated in this program more

reliable workers than people in the community?

 Example:

 A sociologist is assessing the effectiveness of a rehabilitation program for

alcoholics in her city. The program serves a large area, and she cannot test every

single client. Instead, she draws a random sample of 127 people from the list of

all clients and questions them on a variety of issues. She notices that, on average,

the people in her sample miss fewer days of work each year than workers in the

city as a whole. On the basis of this unexpected finding, she decides to conduct a

test to see if the workers in her sample really are more reliable than workers in

the community as a whole.

 Our research question is: Are people treated in this program more

reliable workers than people in the community?

 The sample of program clients (mean=6.8 missed days/year) appears to be different

from the city population (mean=7.2 missed days/year).

 Is the mean for our sample of program clients different enough from the mean for

the city population for us to conclude that the difference is not due to chance?

Testing Hypothesis in 5 Steps

 The Five-Step Model:

 Step 1: Make assumptions and meet test requirements.

 Step 2: State the null hypothesis.

 Step 3: Select the sampling distribution and establish the critical

region.

 Step 4: Compute the test statistic.

 Step 5: Make a decision and interpret the results.

Step 1- 5

 Step 1: Make Assumptions and Meet Test Requirements  All statistical tests must meet certain assumptions prior to their use.

For hypothesis testing, these include:

 A probability (EPSEM) sample

 An interval-ratio level variable – so that calculation of a

mean is appropriate

 A normal sampling distribution – so that we can use the

normal curve table to find areas and probabilities

 The diagram below can be used to represent our hypothesis test.

 Step 2: State the Null Hypothesis  The null hypothesis is a statement of “no difference.”

 For one-sample hypothesis tests, the null hypothesis states that the

sample comes from a population with a certain characteristic.

 The null hypothesis usually takes the form of H o : μ =

some number.

 We also usually state the research hypothesis, which directly

contradicts the null hypothesis, and represents the researcher’s

expectations.

 The research hypothesis usually takes the form of H 1 : μ ≠

some number.

 Why might our sample of program clients differ from the population of city residents?

 The difference between the two means reflects a real difference between the two

groups (of program clients and city residents).

 That is, the difference is statistically significant.

 The difference between the two means was caused by random chance. The

respective groups are not different.

 This reason is represented by the null hypothesis (Ho).

 In this example, the null would be written as Ho: μ=7.2.

 Hypothesis tests assume that the null hypothesis (no difference) is true and

that the sample results were due to chance.

An Overview of Hypothesis Testing

 Before we find the probability of our sample outcome, we have to first establish a

threshold of difference.

 That is, exactly how different must our sample mean (of program

clients) be from what we hypothesize the population mean (of program

clients) to be, for us to determine that the difference is real or due to

chance?

Often, we use the threshold of 0.05. If there is only a 5% chance of getting a sample mean of

6.8, given that the population mean is really 7.2, then we can conclude that the null hypothesis is

wrong.

 Given the assumption of a true null hypothesis, we can estimate the probability of

obtaining our sample result (with a mean of 6.8 absences).

 That is, if the population of program clients really has the same mean

as the city population (7.2), what is the probability that we could have

selected a sample of clients from the population of all clients that has a

mean of 6.8?

 Using the sampling distribution, the normal curve, and the Central

Limit Theorem, we can determine this probability.

 Given the assumption of a true null hypothesis, we can estimate the probability of

obtaining our sample result (with a mean of 6.8 absences).

 That is, if the population of program clients really has the same mean as the city

population (7.2), what is the probability that we could have selected a sample of

clients from the population of all clients that has a mean of 6.8?

 Using the sampling distribution, the normal curve, and the Central Limit

Theorem, we can determine this probability.

 Step 4: Compute the Test Statistic

 Convert the sample outcome to a standardized Z score.

 This statistic is called the Z(obtained).

 For a one-sample hypothesis test of a mean (large sample) use:

 In symbolic form, our threshold looks like this:

- Here we have divided the 0.05 area in half across the two tails. The Z score that

corresponds to this area is ±1.96.

If our sample outcome falls in the shaded area (the critical region or rejection region) then the

null hypothesis is wrong and can be rejected. That is, the difference is real and not due to chance

- To locate our sample mean in the sampling distribution we must standardize it into a Z score.

-Previously, we converted single raw values into Z scores by subtracting the mean, and dividing

by the sample standard deviation.

unknowniswhen Ns

x obtainedZ

knowniswhen N

x obtainedZ

 

 

1 )(

)(

 

 

-To standardize a sample mean into a Z score for a sampling distribution, rather than an empirical

distribution, we must compare it to the population mean and divide by the standard error.

- Using this formula, the Z score for our sample mean of 6.8 is -3.15. Our sample mean

falls 3.15 standard errors below our hypothesized population mean of 7.2.

 Step 5: Make a Decision and Interpret the Results -Compare the Z(obtained) to the Z(critical)

- If the Z(obtained) falls in the critical region, then reject the null hypothesis.

- The results suggest a significant difference; the sample results reflect a true difference.

-If the Z(obtained) does not fall in the critical region, then fail to reject the null hypothesis.

The results suggest an insignificant difference; the sample results are due to chance.

 Now we can locate our sample mean on the sampling distribution.

 Because our sample mean falls in the shaded region, we can conclude that it is

highly unlikely that, given a mean of 7.2 for the population of program clients, we

could have selected a sample that had a mean of 6.8.

 Thus, we reject the null hypothesis and conclude that program clients are

significantly different from city residents on absenteeism.

 Of course, there is a 5% chance that we could be wrong. This would be called a

Type I error.

Choosing a one-tailed or two-tailed Test

 Because our sample mean falls in the shaded region, we can conclude that it is

highly unlikely that, given a mean of 7.2 for the population of program clients, we

could have selected a sample that had a mean of 6.8.

 Thus, we reject the null hypothesis and conclude that program clients are

significantly different from city residents on absenteeism.

 Of course, there is a 5% chance that we could be wrong. This would be called a

Type I error.

 The decision to use a one-tailed or two-tailed test is reflected in the research

hypothesis.

 In a two-tailed test the research hypothesis looks like this:

H 1 : μ ≠ (some number)

 In a one-tailed test the research hypothesis looks like this:

H 1 : μ > (some number) or H

1 : μ < (some number)

 Because the critical region differs across one-tailed and two-tailed tests, the

Z(critical) will also change.

Selecting an Alpha Level

 Setting an alpha level requires defining what an “unlikely” sample outcome is.

 The most common alpha level is 0.05, but 0.10, 0.01, and 0.001 are

also used.

 There are two possible mistakes:

 Type I Errors: Rejecting a null hypothesis that is really true.

 Type II Errors: Failing to reject a null hypothesis that is really false.

 The probability of committing a Type I Error is alpha. The lower we set alpha, the less

the probability of committing a Type I Error.

 However, as the probability of committing a Type I Error decreases, the probability of

committing a Type II Error increases.

 But as social scientists, we are usually more concerned with Type I Error than

Type II Error.

 An accepted balance between these two error probabilities is with an alpha of

0.05.

The Z Distribution and the Student’s t- Distribution

 For large samples of 100 or more, we use the Z Distribution.

 In these cases, the sample standard deviation is a good estimate of the

population standard deviation.

 However, when substituting the sample standard deviation for the

population standard deviation, we use N-1 in the denominator instead

of N (see the formula in Slide 21).

 For smaller samples of less than 100, we use the Student’s t Distribution.

 The Student’s t distribution is flatter than the Z distribution.

 The Student’s t distribution varies with sample size. The larger the

sample, the closer the t distribution is to the Z distribution.

The Student’s t Distribution

 The Student’s t Distribution is displayed differently than the Z table.

 For the t distribution, we need to calculate degrees of freedom (df). For a one-

sample hypothesis tests, df is equal to N-1.

 The numbers in the body of the t table are not areas, but critical values of the test

statistic, or t(critical).

 When conducting a one-sample hypothesis test with a small sample (N<100), Step 3 of

the hypothesis test will change.

 In Step 5 you will compare the computed test statistic to the value from the t table.

 Practice computing the critical t value when the sample size is 20, and the df (N-1) is

equal to 19.

 For a One-tailed test, using the .05 significance level, the Critical t would be

1.729.

 For a Two-tailed test, using the .05 significance level, the Critical t would be

2.093.

Tests of Hypotheses for Single-Sample Proportions (large Samples)

 One-sample hypothesis tests of proportions use the same five steps as tests of

means, with a couple of changes:

 In Step 2 your null and research hypotheses will reference the

population proportion (P u ).

 In Step 3 you will always use the Z distribution (Appendix A) to look

up the Z(critical).

 In Step 4 you will calculate the Z(obtained) as:

 Example: A random sample of 122 households in a low-income neighborhood

revealed that 53 (or a proportion of 0.43) of the households were headed by women.

In the city as a whole, the proportion of female-headed households is 0.39. Are

households in the lower-income neighborhoods significantly different from the city

as a whole in terms of this characteristic?

 Example: A random sample of 122 households in a low-income neighborhood

revealed that 53 (or a proportion of 0.43) of the households were headed by women. In

the city as a whole, the proportion of female-headed households is 0.39. Are households

in the lower-income neighborhoods significantly different from the city as a whole in

terms of this characteristic?

 Step 3: Select the Sampling Distribution and Establish the Critical Region

 Use the Z distribution

 Alpha = 0.10, two-tailed

 Z(critical) = ±1.65

 Step 4: Compute the Test Statistic

 Step 5: Make a Decision and Interpret the Results  The Z(obtained) does not fall in the critical region.

 Fail to reject the null hypothesis.

N)P1(P

PP )obtained(Z

uu

us

 

Conclude that there is no statistically significant difference between the low-income

neighborhoods and the city as a whole in terms of female-headed households.

Summary of Unit 7

 All tests of a hypothesis involve finding the probability of the observed sample

outcome, given that the null hypothesis is true.

 The five-step model will be our framework for decision making throughout the

hypothesis testing chapters. What we do during each step, however, will vary,

depending on the specific test being conducted.

 If we can predict a direction for the difference in stating the research hypothesis, we

use a one-tailed test is called. If no direction can be predicted, a two-tailed test is

appropriate.

 There are two kinds of errors in hypothesis testing. Type I, or alpha error, is

rejecting a true null hypothesis; Type II, or beta error, is failing to reject a false null

hypothesis. The probabilities of committing these two types of error are inversely

related.

 All tests of a hypothesis involve finding the probability of the observed sample outcome,

given that the null hypothesis is true.

 The five-step model will be our framework for decision making throughout the

hypothesis testing chapters. What we do during each step, however, will vary, depending

on the specific test being conducted.

 If we can predict a direction for the difference in stating the research hypothesis, we use a

one-tailed test is called. If no direction can be predicted, a two-tailed test is appropriate.

 There are two kinds of errors in hypothesis testing. Type I, or alpha error, is rejecting a

true null hypothesis; Type II, or beta error, is failing to reject a false null hypothesis.

The probabilities of committing these two types of error are inversely related.

Basic Terms

 Alpha level (α)

 Critical region (region of rejection)

 Five-step model

 Hypothesis testing

 Null hypothesis (H o )

 One-tailed test

 Research hypothesis (H 1 )

 Significance testing

 Student’s t distribution

 t(critical)

 t(obtained)

 Test statistic

 Two-tailed test

 Type I error (alpha error)

 Type II error (beta error)

 Z(critical)

Z(obtained