Statics
BUS 308 Week 2 Lecture 2
Lecture 1 introduced the logic of statistical testing using the hypothesis testing procedure. It also mentioned that we will be looking at two different tests this week. The t-test is used to determine if means differ, from either a standard or claim or from another group. The F-test is used to examine variance differences between groups.
This lecture will look at sampling a bit more, and then focus on how to use the hypothesis testing procedure and the F-test for variance to determine if the variances for males and female compa-ratio distributions have equal variances. We will also look at how we can use Excel to perform the F-tests for variance differences.
We need to remember that seeing different value (mean, variance, etc.) from different samples does not tell us that the population parameters we are estimating are, in fact, different. The one thing we know about sampling is that each sample will be a bit different. They generally provide a “close enough” estimate to the population values of concern for decision and action purposes. But, they are not an exact match. This difference is examined by the use of the statistical tests, which tell us how much importance we should attach to observed differences.
Lecture Examples
The lectures for each week will also look at our class question of whether or not males and females are paid equally for equal work. These additional analyses provide some different clues on what the data is trying to tell us about company pay practices.
While your analysis will focus directly on the salary that males and females are being paid, the lecture examples will use an alternate method of examining pay practices. Many compensation professionals often use a relative pay measure called the “comparison-ratio,” or compa-ratio, to examine pay patterns within the company.
Some background on this measure. Many companies use grades to group jobs of equal value to the company into groups that have a similar pay range – the values that a company is willing to pay employees for the job. (As strong as a performer a mail room clerk is, they will rarely be paid the same as the CEO.) Many companies will set the middle of this range, the midpoint, as the average salary that that market pays to hire someone into the job. This is how companies remain competitive in their hiring.
Now, compensation professionals will generally want to analyze how the company is paying employees relative to these market rates (as summarized by the midpoint). One approach is to divide each employee’s salary by its related midpoint. The outcome is the compa-ratio which is considered an alternate measure of pay that eliminates the impact of different grades. The compa-ratio reports pay as a ratio of the actual salary divided by the salary grade’s midpoint.
The compa-ratio shows if an employee is being paid more than the midpoint (measure’s value > 1.0) or less than the midpoint (< 1.0). This measure allows us to look at salary dispersion within a company without focusing on the exact dollar values. It allows a comparison
between what the company is paying and what the outside market is paying (which most company’s target as the midpoint of a salary range) for the jobs.
The compa-ratio shows if employees are paid above or below the grade midpoint and it can be used to see what the dispersion pattern of pay. Equal pay would expect to see similar distributions, variances and means, between males and females in this measure.
The lecture examples will cover the same statistical tests as the homework assignments but will focus on the compa-ratio pay measure rather than salary. As such, the results presented each week should be included and/or factored into your weekly conclusions on what the data has told us about the answer to our question.
F-test
As noted, the F-test is used to compare variances to determine if the differences noted could be from simple sampling error (also known as pure chance alone) or if the differences are large enough to be considered statistically different. The F statistic is simply one group’s variance divided by the other group’s variance. When done by hand, it is traditional to have the largest variance in the numerator, but this is not critical when Excel performs the test for us.
One new concept introduced with the F-test is the idea of degrees of freedom (df). While the technical explanation is somewhat tedious, we can understand the concept with a simple example. If we have 5 numbers, for example: 1, 2, 3, 4, and 5; we also have a sum of them; in this case 15. Now, assume we can change any of the numbers in the data set with the only requirement being that the total must remain the same. How many of the numbers are we free to change; or what is our degree of freedom in making changes? In this case, we can change any 4 of the values, as soon as we do so we automatically get the fifth value (whatever is needed to make the sum equal to 15). Thus, to generalize, our df is the count we have minus 1 (equaling n -1). N-1 is the formula for the degrees of freedom for each variable in the F-test. We will se this idea in other statistical tests, each of which has its own formula to calculate it. The nice thing is that Excel will give us this outcome without our needing to worry about it, and we rarely have to actually use it in any of our work – but, it is technically part of most statistical tests.
There are two versions of the F-test available for use. One is located in the Data ribbon under the Analysis block in the Data Analysis link and is called F-Test Two-Sample for Variances. The other is located in the Fx Statistical list (which is duplicated in the Formulas ribbon under the More Functions option and the Statistical list) and is called simply F.test.
While both test variances, there is an important difference. The F-Test Two Sample for Variances option provides some additional summary statistics (mean, variance, count) for each sample, but only provides a one-tail test outcome. One-tail results, whether with the F-Test or the T-test are used to test a directional difference in variances, when we want to know if one variance is larger (or smaller) than the other. Since, in general we are interested in the simpler question of whether the variances are equal or not (without regard to which is larger), when using this form to test for equality or not, we need to double the p-value to find the two t-tail p- value we need for our decision on rejecting the null hypothesis.
On the other hand, the F.test found in Fx or Formulas returns only the two-tail p-value; enough for a decision on rejecting or failing to reject the null hypothesis of no difference but nothing else. Technically, this is the version we should use when conducting our two-tail questions in the homework, but (as noted) either can be used if we remember to double the p- value for the one -tail outcome.
Example: Testing for Variance Equality
As mentioned above, it is often beneficial to start with looking at variance equality when comparing groups. We need to start our analysis of equal pay for equal work by seeing if there is even an issue to be concerned with. So, we have selected our random sample of 25 males and 25 females from our corporate population. (A couple of assumptions; the company exists in only one location, and all our employees in the sample are exempt professions or managers with at least a bachelor’s college degree.)
Our initial question is: Are the male and female compa-ratio variances equal? (Note, if they are, this would mean that the standard deviations of both groups are the same.) As with all statistical tests, we will be using our samples to make judgements or inferences about the population values. While the sample result values will differ, this difference may not be large enough to show that the population values are not the same.
Question 1. The first question in this week’s assignment asks if the salary variance for males and females are equal within the population. This example will show how to answer the question using the compa-ratio data. The logic and approach for answering the salary based question is the same as shown below.
Data Set-up. Before we even start with any test, we need to set-up the data. The set-up is simple and straightforward. Probably the simplest approach, using compa-ratio for our examples, is to copy the entire compa-ratio set of values (all 50) as well as the label and post them into column S (for example). Then copy the entire column of Gender1 M/F labels and post them into column T. Use Data Sort to sort both columns by Gender 1. Then copy the Male Values and paste them in a column labeled Males (perhaps in column Q). Do the same for the Female values under the label Females (in column R). (While this step is not really needed when using F.Test, it is helpful in keeping all of our variables aligned nicely.) Here is a screen shot of what this might look like for you. (Ignore the Ho column, that is for another question later on.)
Variance equality is tested using the F-Test. We have noted that there are two versions of this test that we could use, and both will be shown below. Note that equal standard deviations do not automatically mean that the means are close, it just tells us if the dispersion patterns are similar. If similar, the means of each group can be considered equally reflective of the data. The following focuses on just setting up the data for and performing the statistical test.
The following shows the setting up of the hypothesis testing steps and conducting of the F-test to answer our question about the equality of male and female compa-ratio variance. (Note: again, you will perform these steps for salary variance in your homework.)
Step 1: Ho: Male compa-ratio variance = Female compa-ratio variance
Ha: Male compa-ratio variance =/= Female compa-ratio variance
(Since the question asks about equality and not a directional difference, this is a two-tail test. The Null must contain the names of the two variables involved (Male and Female), the statistic being tested (variance), and the relationship sign (=). The alternate provides the opposite view so that between them all possible outcomes are covered. We are only concerned if the variances are equal, not whether one or the other is larger (or smaller).)
Step 2: Alpha = 0.05
(This will be the same for all statistical tests we perform in the class, and therefore the same in all hypothesis set-ups.)
Step 3: F statistic and F-test for Variance. We use these as they are designed to test variance equality.
(In this step, identify the statistic being used, the specific test to be used, and why the test was chosen.)
Step 4: Reject the null hypothesis if the p-value is < alpha = 0.05.
(This step is also the same for every statistical test we will perform; it says we will reject the null hypothesis if the probability of getting a result as large as what we see is less than 5% or a probability of 0.05.)
Note that these steps are set-up before we even look at the data. While, we may have set up the data columns, but should not have done any analysis yet. These steps tell us how we will make a decision from the results we get.
Step 5: Perform the analysis. This step provides the cell location to place your results in, K10 for this question. This cell location should be placed in the output range box.
(This is when we set up and perform the statistical test. The assignment each week says where the output table should be placed.)
Below is a screen shot of locating the F-test Two-Sample for Variance in the Analysis ToolPak
list. Here is a screenshot of where the F.Test is found in the fx Statistical list.
Either test can be used for this question. Note that one data set is entered into the Variable 1 Range box, and the other in the Variable 2 Range box. This can be done in either of two ways. First, just type the starting and ending cell identifier into the range separated by a colon, such as Q1:Q26. The dollar signs that show up tell Excel that this is an unchangeable reference – a useful feature, but not absolutely needed. The other approach, used in the above example, is to Left click on the up arrow box on the right end of each data entry box. This opens another window. Simply move your cursor to the top of the data range you want to enter, click
on the top value, drag the cursor down to the end and release the button. Then click on the blue box next to the data range, and your input will show up in the set-up box.
The labels box should be checked if you included labels in your ranges. Hint: if you checked this box but did not actually have labels, the output will show data values as your labels and the count will be one less than the 25 males and females we have in our data set. This is a great clue that you need to go back and re-enter the data ranges to include the label and all of the data points. The final entry is in the output range box; type or highlight the cell you want to use for your upper left-hand corner of the result. Clicking OK (upper right) will produce your result.
The F.Test is even simpler to set-up. Going to Fx (or Formulas), statistical list, and selecting the F.Test will produce a data entry box that simply asks for each data range – as with the top entries in the F-Test shown above. Complete them in the same way, and select Ok. (Do not include labels in these ranges.) The F.Test outcome shows up in the cell your cursor was on when you opened the Fx link.
Here is a screenshot of the data entry box for the F-Test Two-Sample for Variance. Note that the compa-ratios have been copied over to columns headed by labels of Male and Female. This lets our test results show the label for each group. Also note, that for this screenshot, the results are placed next to the data columns, while in your assignment K10 should be listed in the Output Range box. Note, always enter the variables in the order listed in the null hypothesis statement; since the male values were entered in the Variable 1 range, the hypothesis statements should list the male variable first.
Here is a screenshot of the results for both versions of the F-test. (Only one is needed for the question.)
Before moving on to interpreting these results, let’s look at what we have. The F-Test Two- Sample for Variances output clearly has more information than the F.TEST. We have the labels identifying each group as well as the mean, variance, and count (Observations) for each group. The df, equaling the sample size -1 is shown as well as the calculated F statistic (which equals the left group’s (or Male in this case) variance divided by the right group’s (Female) variance. The next two rows are critical for our decision making; but they are incomplete. They show the one-tail critical values used in decision making. The P(F<=f) one-tail is the P-value, or the probability of getting an F-value as large or larger than we have if the null hypothesis is true. However, it is only a one-tail outcome, while we want a two-tail outcome. We only care about the variances being equal or not, not which one is larger. So, the result as presented cannot be used directly. What we need to do will be covered after we look at the F.TEST result.
The F.TEST gives less information, but it provides us with exactly what we need; the two-tail p- value. For our data set, we have a 40% (rounded) chance of getting an F value this large or larger purely by chance alone when we are looking at a two-tail outcome. Note that this value, 0.39766 is twice the one-tail value of 0.19883 from the F-Test table. This will always be true, the two-tail probabilities will be twice the one-tail values.
So, if we want to use the F-TEST Two-Sample for Variance tool, we need to double the p-value before making our step 6 decisions. We are now ready to move on to step 6.
Step 6: Conclusions and Interpretation. This step has several parts.
What is the p-value: 0.3977 (our compa-ratio example result).
(Where the p-value for this variance test depends upon which F test was used. If the FX F.test was used, the p-value will be found in cell K10, and we can enter =K10 as our answer.
If we used the Analysis ToolPak’s F-Test Two Sample for Variances tool, the one tail p-value is found in the P(F<=f) one -tail row. Since this is a one tail value, we double it and put its location as our answer.)
What is your decision: REJ or Not reject the null? NOT Rej (our compa-ratio result)
(If our p-value is < (less than) 0.05, we say REJ, if the p-value is > (larger than) 0.05, we say NOT reject. This is what our decision rule says to do.)
Why? The p-value is > (greater than) 0.05. (The compa-ratio result)
(The answer here is simply why, based on the reasoning shown above, you chose your REJ or NOT Rej choice.)
What is your conclusion about the variances in the population for the male and female salaries? We do not have enough evidence to say that the variances differ in the population. The variances are equal in the population. (Our compa-ratio result.)
(Here is where we take our statistical decision, and translate it back to a verbal answer to the initial question. If we do not reject the null, then the relationship expressed in the null is correct and we say we do not have enough evidence to say the variances differ. If we reject the null, we say what the null hypothesis says, the variances are unequal. With a two-tail test, we do not make any claims about which variable is larger.)
VIDEO Link: Here is a video on the F-Test Two Sample for Variances: https://screencast-o- matic.com/watch/cbQuFRIwDX .
Please ask your instructor if you have any questions about this material.
When you have finished with this lecture, please respond to Discussion thread 2 for this week with your initial response and responses to others over a couple of days.