Brilliant Answer

hottboy561
SkillBuilderEvaluatingPValues.docx

Words in orange represent glossary terms. You can locate the Glossary in Appendix 1.

Hypothesis Testing

In doing research, one of the most common activities is testing hypotheses. The Afrobarometer data set below is a survey of African citizens’ attitudes on democracy, governance, the economy, and other related topics ( www.afrobarometer.org ). Using this data set, you might want to examine hypotheses related to whether rural and urban citizens differ, on average, in how much they trust the government. The tables below present results from an independent samples t-test to examine these hypotheses using a random sample of 44 participants from the complete data set. Each respondent’s score is a value between 0 and 15 with a higher score indicating greater trust. You can see that the mean for the urban group is 7.00 ( SD = 4.17) and the mean for the rural group is 7.74 ( SD = 4.38).  The observed value of the t-statistic is -.564 and the p-value equals 0.576 (see the column labeled “Sig. (2-tailed)”).

African Citizens' Attitudes on Democracy

The tables below present results from an independent samples t-test to examine these hypotheses using a random sample of 44 participants from the complete data set. Each respondent’s score is a value between 0 and 15 with a higher score indicating greater trust. You can see that the mean for the urban group is 7.00 ( SD = 4.17) and the mean for the rural group is 7.74 ( SD = 4.38).  The observed value of the t-statistic is -.564 and the p-value equals 0.576 (see the column labeled “Sig. (2-tailed)”).

t

df

Sig. (2-tailed)

Mean Difference

Std. Error Difference

Trust in Government Index (higher scores = more trust)

-.564

41

.576

-.73913

1.30978

Group Statistics

Urban or Rural Primary Sampling Unit

N

Mean

Std. Deviation

Std. Error Mean

Trust in Government Index (higher scores = more trust)

Urban

20

7.000

4.16754

.93189

Rural

30

7.7391

4.38196

.91370

The  p-value  is the probability of obtaining a value more extreme than .564 (less than -.564 or greater than +.564) if you were to repeat the test with a new sample of data and if the null hypothesis is true. You will see in this Skill Builder that the p-value can easily be used to make statistical decisions in hypothesis testing. However, while the p-value is important in determining statistical significance, it does not tell the whole story.

Steps of Hypothesis Testing

To interpret p-values, let's review the key steps in hypothesis testing. Use the < and > icons to navigate between the steps.

One-tailed vs. Two-tailed Tests

One important factor to be aware of is whether the test you are conducting is one-tailed or two-tailed. So far, the hypotheses have been written for a two-tailed test, which means that the alternative hypothesis stated simply that there was a difference between the means, without specifying the direction of the difference. In a one-tailed test, the alternative hypothesis does specify the direction of the difference; that is, it specifies that one of the means (e.g., urban or rural) is expected to be larger than the other.

In a one-tailed test, the p-value will be the area in the test statistic distribution to the right of the observed value if the alternative hypothesis has an “is greater than” sign, and to the left of the observed value if the alternative hypothesis has an “is less than” sign. For example, suppose we had the following hypothesis test: 

For a two-tailed test, as is being illustrated with the Afrobarometer data file, the area beyond the observed value is doubled to obtain the p-value. The reason for doubling is related to setting the rejection region for a two-tailed test. For a two-tailed test, alpha is divided in half (α/2), and the “half-areas” are used to identify rejection regions in both the upper and lower tails of the test statistic’s sampling distribution. 

The doubling of the area beyond the observed value allows the p-value to be compared to alpha to test the null hypothesis.

Again, if alpha had been set equal to .05, the null hypothesis would be retained (fail to reject) because .576 is greater than .05. That is, the data support the position that in the populations of urban and rural citizens, there is no difference in average levels of trust in government.

Keep in mind the following important points related to making a statistical decision and interpreting your p-value:

· bullet

By definition, the p-value is the probability of obtaining a value for the test statistic as extreme or more extreme than the observed value if the null hypothesis is true.  

· bullet

If the p-value is less than alpha, the null is rejected, and the result is said to be statistically significant.

· bullet

If the p-value is greater than alpha, then researchers would fail to reject the null hypothesis.

Statistically Significant Results

The final step in conducting a hypothesis test is to link the statistical result to the real-world. That is, you need to examine the practical significance or the meaningfulness of the statistical result. 

If the result of the hypothesis test is to retain the null—that is, obtain a non-significant result—the researcher has clearly not identified a meaningful effect. In most hypothesis tests, retaining the null is not what the researcher is hoping to do.  

On the other hand, if you reject the null hypothesis, you will have a statistically significant result. You are, in essence, saying that the result is so unlikely under the assumption of the null being true that the null appears to be false. A false null hypothesis does not mean, however, that the result is scientifically or socially important. When a researcher finds a statistically significant result, knowledge of the research area is used to decide whether the result is important and meaningful. Large effects are more often meaningful than small effects, but there are times when small effects can be important. 

Knowledge of the research area is key in making the decision.

Probably the most frequent concern with meaningless statistically significant results has to do with sample size. With extremely large sample sizes, hypothesis tests can result in rejecting the null even though the effect is small and unimportant from an applied perspective. To understand how this works, let’s take another look at the Afrobarometer data set. Participants in the survey were asked whether they agreed or disagreed with the statement, “People must obey the law.” Responses were made using a five-point Likert scale:

1

2

3

4

5

strongly disagree

disagree

neither agree nor disagree

agree

strongly agree

Suppose a researcher had wanted to compare the urban and rural populations and tested the null hypothesis   Ho : μurban = μrural  using alpha equal to .05. Unlike the example above that used a sample of 43 participants, the following results are based on over 50,000 respondents. As shown in the following table, the p-value ( Sig (2-tailed)) for this test is .004.

t

df

Sig (2-tailed)

Mean Difference

Q48b. People must obey the law

Equal variances assumed

-.2892

50125

.004

-.029

Using APA style, the researcher could report that, on average, the urban population agrees less with the statement than does the rural population, t (50125) = -2.892, p = .004, d = .027, 95% CI [-.039, -.019].

· bullet

The statement says the t-test was conducted with 50,125 degrees of freedom or 50,127 participants.

· bullet

The p-value of .004 is less than alpha, so the null hypothesis is rejected.

· bullet

The d statistic is Cohen’s d, a common measure of effect size. 

· bullet

The 95% confidence interval for the difference in population means does not contain zero, which is consistent with having rejected the null hypothesis.

There is no doubt the result is statistically significant, but how meaningful is it? The d-statistic is quite useful because it compares the difference in sample means to an average of the standard deviations for the two groups. (The average standard deviation is based on a weighted average of the two sample variances.) According to Cohen, d = .2 is generally considered a small effect, d = .5 a medium effect, and d = .8 a large effect. The value of .027 is little more than 10% of a small effect. The statistically significant result that was obtained is therefore not likely to be important.