LectureWeek3-1.pdf

Week 3 Lecture 1

Last week, we found out ways to examine differences between a measure taken on two groups (two-sample test situation) as well as comparing that measure to a standard (a one-sample test situation). We looked at the F test which let us test for variance equality. We also looked at the t-test which focused on testing for mean equality. We noted that the t-test had three distinct versions, one for groups that had equal variances, one for groups that had unequal variances, and one for data that was paired (two measures on the same subject, such as salary and midpoint for each employee). We also looked at how the 2-sample unequal t-test could be used to use Excel to perform a one-sample mean test against a standard or constant value. This week we expand our tool kit to let us compare multiple groups for similar mean values.

A second tool will let us look at how data values are distributed – if graphed, would they look the same? Different shapes or patterns often means the data sets differ in significant ways that can help explain results.

As interesting as comparing two groups is, often it is a bit limiting as to what it tells us. One obvious issue that we are missing in the comparisons made last week was equal work. This idea is still somewhat hard to get a clear handle on. Typically, as we look at this issue, questions arise about things such as performance appraisal ratings, education distribution, seniority impact, etc.

Some of these can be tested with the tools introduced last week. We can see, for example, if the performance rating average is the same for each gender. What we couldn’t do, at this point however, is see if performance ratings differ by grade, do the more senior workers perform relatively better? Is there a difference between ratings for each gender by grade level? The same questions can be asked about seniority impact. This week will give us tools to expand how we look at the clues hidden within the data set about equal pay for equal work.

ANOVA

So, let’s start taking a look at these questions. The first tool for this week is the Analysis of Variance – ANOVA for short. ANOVA is often confusing for students; it says it analyzes variance (which it does) but the purpose of an ANOVA test is to determine if the means of different groups are the same! Now, so far, we have considered means and variance to be two, distinct characteristics of data sets; characteristics that are not related, yet here we are saying that looking at one will give us insight into the other.

The reason is due to the way the variance is analyzed. Just as our detectives succeed by looking at the clues and data in different ways, so does ANOVA. There are two key variances that are examined with this test. The first, called Within Group variance, is the average variance of the groups. ANOVA assumes the population(s) the samples are taken from have the same variation, so this average is an estimate of the population variance.

The second is the variance of the entire group, Between Group Variation, as if all the samples were from the same group. Here are exhibits showing two situations. In Exhibit A, the groups are close together, in fact they are overlapping, and the means are obviously close to each

other. The variation of the entire group (which would be the outside curves on each side) is very close to the average variation of each of the groups.

So, if we divide our estimate of the overall variation by the estimate of our average group variation, we would get a value close to 1, and certainly less than about 1.5. Recalling the F statistic from last week, we would say there is not a significant difference in the variation estimates.

Look at three sample distributions in Exhibit A. Each has the same within group variance, and the overall variance of the entire data set is not all that much larger than the average of the three separate groups. This would give us an F relatively close to 1.00.

Exhibit A: No Significant Difference with Overall Variation

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-10 -5 0 5 10

Exhibit B: Significant Difference with Overall Variation

Now, if we look at exhibit B, we see a different situation. Here the group distributions do not overlap, and the means are quite different. If we were to divide the overall variance by the average variance we would get a value quite a bit larger than the value we calculated with the pervious samples, probably large enough to indicate a difference between the within and between group variation estimates.

This is essentially what ANOVA does; we will look at how and the output in the next lecture. If the F statistic is statistically significant (the null hypothesis of no difference is rejected), then we can say that the means are different. Neat!

So, why bother learning a new tool to test means? Why don’t we merely use multiple t- tests to test each pair separately. Granted, it would take more time that doing a single test, but with Excel that is not much of an issue. The best reason to use ANOVA is to ensure we do not reduce our confidence in our results. If we use an alpha of 0.05, it is essentially saying we are 95% sure we made the right decision in rejecting the null. However, if we do even 3 t-tests on related data, our confidence drops to .95*.95*.95 = 0.857! And, in comparing means for 6 groups, we have 16 comparisons which would reduce our overall confidence that all decisions were correct to 44%. Not very good. Therefore, a single ANOVA test is much better for our confidence in making the right decision than multiple T-tests.

The hypothesis testing procedure steps are set up in a similar fashion to what we did in with the t-tests. There is a single approach to the null and alternate hypothesis statements with ANOVA.

Ho: All means are equal

Ha: At least one mean differs.

The reason for this is simple. No matter how many groups we are testing, if a single mean differs, we will reject the null hypothesis. And, it can get cumbersome listing all possible outcomes of one or more means differing for the alternate.

One issue remains for us if we reject the null of no differences among the mean, which means are different? This is done by constructing what we can call, for now, difference intervals. A difference interval will give us a range of values that the “real” difference between two means could really be. Remember, since the means are from samples, they are close approximations to the actual population mean, which might be a bit larger or smaller than any given mean. These difference intervals will take into account the possible sampling error we have. (How we do this will be discussed in the next lecture this week.).

A difference interval might be -2 to +1.8. This says that the actual difference when we subtract one mean from another could be any value between -2 to +1.8. Since this interval says the difference could be 0 (meaning the means could be the same), we would find this pair of means to be not significantly different. If, however, our difference range was, for example, from

+1.8 to + 3.8 (the same range but all positive values), we would say the difference between the means is significant as 0 is not within the range.

ANOVA is a very useful tool when we need to compare multiple groups. For example, this can be used to see if average shipping costs are the same across multiple shippers. The average time to fill open positions using different advertising approaches, or the associated costs of each, can also be tested with this technique.

Chi Square Tests

The ANOVA test somewhat relies upon the shape of the samples, both with our assumption that each sample is normally distributed with an equal variance and with their relative relationship (how close or distant they are). In many cases, we are concerned more with the distribution of our variables than with other measures. In some cases, particularly with nominal labels, distribution is all we can measure.

In our salary question, one issue that might impact our analysis is knowing if males and females are distributed across the grades in a similar pattern. If not, then whichever gender holds more higher-level jobs would obviously have higher salaries – this might be an affirmative action or possible discrimination issue, but certainly not an equal pay for equal work situation.

So, again, we have some data that we are looking at, but are not sure how to make the decision if things are the same or not. And, with our examination of means, we cannot just look at the data we have, as it is a sample and not completely accurate.

But, have no fear, statistics comes to our rescue! Examining distributions, or shapes, or counts per group (all ways of describing the same data) is done using a version of the Chi Square test; and, after setting up the data Excel does the work for us.

In comparing distributions, and we can do this with discrete (such as the number of employees in each grade) variables or continuous variables (such as age or years of service which can take any value within a range if measured precisely enough) that we divide into ranges, we simply count how many are in each group or range. For something like the distribution of gender by grades; simply count how many males and females are in each grade, simple even if a bit tedious. For something like compa-ratio, we first set up the range values we are interested in (such as .80 up to but not including .90, etc.), and then count how many values fall within each group range.

These counts are displayed in tables, such as the following on gender distribution by grade. The first is the distribution of employees by grade level for the entire sample, and the second is the distribution by gender. The question we ask is for both kinds of tables is basically the same, is the difference enough to be statistically significant or meaningfully different from our comparison standard?

A B C D E F Overall 15 7 5 5 12 6

A B C D E F Male 3 3 3 2 10 4 Female 12 4 2 3 2 2

The answer to the question of whether the distributions are different enough, when using the Chi Square test, depends with the group we are comparing the distribution with. When we are dealing with a single row table, we need to decide what our comparison group or distribution is. For example, we could decide to compare the existing distribution or shape against a claim that the employees are spread out equally across the 6 grades with 50/6 = 8.33 employees in each grade. Or we could decide to compare the existing distribution against a pyramid shape - a more typical organization hierarchy, with the most employees at the lower grades (A and B) and fewer at the top; for example, 17, 10, 8, 7, 5, 3. The expected frequency per cell does not need to be a whole number. What is important is having some justification for the comparison distribution we use.

When we have multi-row tables, such as the second example with 2 rows, the comparison group is known or considered to be basically the average of the existing counts. We will get into exactly how to set this up in the next lecture. In either case the comparison (or “expected”) distribution needs to have the row and column total sums to be the same as the original or actual counts.

The hypothesis claims for either chi square test are basically the same:

Ho: Variable counts are distributed as expected (a claim of no difference)

Ha: Variable counts are not distributed as expected (a claim that a difference exists)

Comparing distributions/shapes has a lot of uses in business. Manufacturing generally produces parts that have some variation in key measures; we can use the Chi Square to see if the distribution of these differences from the specification value is normally distributed, or if the distribution is changing overtime (indicating something is changing – such as machine tolerances). The author used this approach to compare the distribution/pattern of responses to questions on an employee opinion survey between departments and the overall division. Different response patterns suggested the issue was a departmental one while similar patterns suggested that the division “owned” the results, indicating which group should develop ways to improve the results.

If you have any questions on this material, please ask your instructor.

After finishing with this lecture, please go to the first discussion for the week, and engage in a discussion with others in the class over the first couple of days before reading the second lecture.