statistics

profileantmidd026
BUS308W2Lecure-2A.pdf

BUS 308 Week 2 Lecture 2

Statistical Testing for Differences – Part 1

After reading this lecture, the student should know:

1. How statistical distributions are used in hypothesis testing. 2. How to interpret the F test (both options) produced by Excel 3. How to interpret the T-test produced by Excel

Overview

Lecture 1 introduced the logic of statistical testing using the hypothesis testing procedure. It also mentioned that we will be looking at two different tests this week. The t-test is used to determine if means differ, from either a standard or claim or from another group. The F-test is used to examine variance differences between groups.

This lecture starts by looking at statistical distributions – they underline the entire statistical testing approach. They are kind of like the detective’s base belief that crimes are committed for only a couple of reasons – money, vengeance, or love. The statistical distribution that underlies each test assumes that statistical measures (such as the F value when comparing variances and the t value when looking at means) follow a particular pattern, and this can be used to make decisions.

While the underlying distributions differ for the different tests we will be looking at throughout the course, they all have some basic similarities that allow us to examine the t distribution and extrapolate from it to interpreting results based on other distributions.

Distributions. The basic logic for all statistical tests: If the null hypothesis claim is correct, then the distribution of the statistical outcome will be distributed around a central value, and larger and smaller values will be increasingly rare. At some point (and we define this as our alpha value), we can say that the likelihood of getting a difference this large is extremely unlikely and we will say that our results do not seem to come from a population that matches the claims of the null hypothesis.

Note that this logic has several key elements:

1. The test is based on an assumption that the null hypothesis is correct. This gives us a starting point, even if later proven wrong.

2. All sample results are turned into a statistic that matches the test selected (for example, the F statistic when using the F-test, or the t-statistic when using the T-test.)

3. The calculated statistic is compared to a related statistical distribution to see how likely an outcome we have.

4. The larger the test statistic, the more unlikely it is that the result matches or comes from the population described by the null hypothesis claim.

We will demonstrate these ideas by looking at the questions being asked in this week’s homework. We will show results of the related Excel tests, and discuss how to interpret the output.

We need to remember that seeing different value (mean, variance, etc.) from different samples does not tell us that the population parameters we are estimating are, in fact, different. The one thing we know about sampling is that each sample will be a bit different. They generally provide a “close enough” estimate to the population values of concern for decision and action purposes. But, they are not an exact match. This difference is examined by the use of the statistical tests, which tell us how much importance we should attach to observed differences.

Lecture Examples

The lectures for each week will also look at our class question of whether or not males and females are paid equally for equal work. These additional analyses provide some different clues on what the data is trying to tell us about company pay practices.

While your analysis will focus directly on the salary that males and females are being paid, the lecture examples will use an alternate method of examining pay practices. Many compensation professionals often use a relative pay measure called the “comparison-ratio,” or compa-ratio, to examine pay patterns within the company.

Some background on this measure. Many companies use grades to group jobs of equal value to the company into groups that have a similar pay range – the values that a company is willing to pay employees for the job. (As strong as a performer a mail room clerk is, they will rarely be paid the same as the CEO.) Many companies will set the middle of this range, the midpoint, as the average salary that that market pays to hire someone into the job. This is how companies remain competitive in their hiring.

Now, compensation professionals will generally want to analyze how the company is paying employees relative to these market rates (as summarized by the midpoint). One approach is to divide each employee’s salary by its related midpoint. The outcome is the compa-ratio which is considered an alternate measure of pay that eliminates the impact of different grades. The compa-ratio reports pay as a ratio of the actual salary divided by the salary grade’s midpoint.

The compa-ratio shows if an employee is being paid more than the midpoint (measure’s value > 1.0) or less than the midpoint (< 1.0). This measure allows us to look at salary dispersion within a company without focusing on the exact dollar values. It allows a comparison between what the company is paying and what the outside market is paying (which most company’s target as the midpoint of a salary range) for the jobs.

The compa-ratio shows if employees are paid above or below the grade midpoint and it can be used to see what the dispersion pattern of pay. Equal pay would expect to see similar distributions, variances and means, between males and females in this measure.

The lecture examples will cover the same statistical tests as the homework assignments but will focus on the compa-ratio pay measure rather than salary. As such, the results presented

each week should be included and/or factored into your weekly conclusions on what the data has told us about the answer to our question.

The first step in looking at whether males and females are paid equally would be to look at the average pay of each. Given our sample is a random sample of the population of employees (and, therefore considered to be representative of the population), the average salaries or average compa-ratios (they measure related but not identical measures of pay) will give us an indication of whether things are the same for each gender or not.

One issue in looking at averages is the variation within the groups. If both groups have the same or very similar variation across the salaries then we test the averages for a difference using one approach. If the group variances are significantly different, we use a slightly approach. So, the first step is to examine group variances. This is done with the F-test.

F-test

As noted, the F-test is used to compare variances to determine if the differences noted could be from simple sampling error (also known as pure chance alone) or if the differences are large enough to be considered statistically different. The F statistic is simply one group’s variance divided by the other group’s variance. (When done by hand, it is traditional to have the largest variance in the numerator, but this is not critical when Excel performs the test for us.) So, if the variances are equal, then the result of one variance divided by the other would equal 1.0 – this is the center of the F distribution. How about a situation where one variance equaled 4 and the other equaled 5 (randomly picked numbers for this example)? If we divided the larger by the smaller, we would get 5/4 = 1.25 while if we divided the smaller by the larger, we would get 0.8. Note that these values are on each side of the center value of 1.00. This is what is meant by “two tails” with the F-test – one tail of the distribution has values less than 1.0 while the other has values greater than 1.0.. Our value of F depends first on the variances (of course) and then on how we do the division. The likelihood of these two variances coming from populations that have the same variance does not depend upon which tail the result is in, but rather how likely it is to see a difference from 1.00. This is given to us by the F-test p-value (probability value of seeing a difference as large or larger than what we have if the null hypothesis is true).

One new concept introduced with the F-test is the idea of degrees of freedom (df). While the technical explanation is somewhat tedious, we can understand the concept with a simple example. If we have 5 numbers, for example: 1, 2, 3, 4, and 5; we also have a sum of them; in this case 15. Now, assume we can change any of the numbers in the data set with the only requirement being that the total must remain the same. How many of the numbers are we free to change; or what is our degree of freedom in making changes? In this case, we can change any 4 of the values, as soon as we do so we automatically get the fifth value (whatever is needed to make the sum equal to 15). Thus, to generalize, our df is the count we have minus 1 (equaling n -1). N-1 is the formula for the degrees of freedom for each variable in the F-test. We will se this idea in other statistical tests, each of which has its own formula to calculate it. The nice thing is that Excel will give us this outcome without our needing to worry about it, and we rarely have to actually use it in any of our work – but, it is technically part of most statistical tests.

There are two versions of the F-test available for use. One is located in the Data ribbon under the Analysis block in the Data Analysis link and is called F-Test Two-Sample for Variances. The other is located in the Fx Statistical list (which is duplicated in the Formulas ribbon under the More Functions option and the Statistical list) and is called simply F.test.

While both test variances, there is an important difference. The F-Test Two Sample for Variances option provides some additional summary statistics (mean, variance, count) for each sample, but only provides a one-tail test outcome. One-tail results, whether with the F-Test or the T-test are used to test a directional difference in variances, when we want to know if one variance is larger (or smaller) than the other. Since, in general we are interested in the simpler question of whether the variances are equal or not (without regard to which is larger), when using this form to test for equality or not, we need to double the p-value to find the two t-tail p- value we need for our decision on rejecting the null hypothesis.

On the other hand, the F.test found in Fx or Formulas returns only the two-tail p-value; enough for a decision on rejecting or failing to reject the null hypothesis of no difference but nothing else. Technically, this is the version we should use when conducting our two-tail questions in the homework, but (as noted) either can be used if we remember to double the p- value for the one -tail outcome.

Example: Testing for Variance Equality

As mentioned above, it is often beneficial to start with looking at variance equality when comparing groups. We need to start our analysis of equal pay for equal work by seeing if there is even an issue to be concerned with. So, we have selected our random sample of 25 males and 25 females from our corporate population. (A couple of assumptions; the company exists in only one location, and all our employees in the sample are exempt professions or managers with at least a bachelor’s college degree.)

Our initial question is: Are the male and female compa-ratio variances equal? (Note, if they are, this would mean that the standard deviations of both groups are the same.) As with all statistical tests, we will be using our samples to make judgements or inferences about the population values. While the sample result values will differ, this difference may not be large enough to show that the population values are not the same.

Question 1. One of the first things of interest to detectives is if the behavior of the suspects differs from what they normally do. That is, who’s behavior varies from the norm? Relating this to our compa-ratio measures has us asking if the compa-ratio variance for males and females are equal within the population. (In the homework, the question asks about salary variances. The logic and approach for answering the salary-based question is the same as shown below.)

Variance equality is tested using the F-Test. There are two versions of this test available to us that we could use, and both will be shown below. Note that equal standard deviations do not automatically mean that the means are close, it just tells us if the dispersion patterns are

similar. If similar, the means of each group can be considered equally reflective of the data. The following focuses on just setting up the data for and performing the statistical test.

The following show only the output for the six hypothesis testing steps. How the Excel F-tests are set up is covered in Lecture 3 for this week.

Step 1: The question asked is whether the variances for males and females are equal. The hypothesis statements for an equality test are shown below.

Ho: Male compa-ratio variance = Female compa-ratio variance

Ha: Male compa-ratio variance =/= Female compa-ratio variance

(Since the question asks about equality and not a directional difference, this is a two-tail test. The Null must contain the names of the two variables involved (Male and Female), the statistic being tested (variance), and the relationship sign (=). The alternate provides the opposite view so that between them all possible outcomes are covered. We are only concerned if the variances are equal, not whether one or the other is larger (or smaller).)

Step 2: We state our decision-making criteria here. It is: Alpha = 0.05 (This will be the same for all statistical tests we perform in the class, and therefore the same in all hypothesis set-ups.)

Step 3: The test, test statistic, and the reason for selecting the test are stated here. For this example, we are using: F statistic and F-test for Variance. We use these as they are designed to test variance equality.

Step 4: Our decision-making rule is presented here: Reject the null hypothesis if the p- value is < alpha = 0.05.

(This step is also the same for every statistical test we will perform; it says we will reject the null hypothesis if the probability of getting a result as large as what we see is less than 5% or a probability of 0.05.)

Note that these steps are set-up before we even look at the data. While, we may have set up the data columns, we should not have done any analysis yet. These steps tell us how we will make a decision from the results we get.

Step 5: Perform the analysis. This is the step where Excel performs the analysis and produces output tables. The setting up of each Excel test is covered in Lecture 3, we are primarily interested in how to interpret the results in this lecture.

Here is a screenshot of the results for both versions of the F-test. (Only one is needed for the question.)

Step 6: Conclusions and Interpretation. This is where we interpret what the data is trying to tell us.

Before moving on to interpreting these results, let’s look at what we have. The F-Test Two-Sample for Variances output clearly has more information than the F.TEST. We have the labels identifying each group as well as the mean, variance, and count (Observations) for each group. The df, equaling the (sample size -1), is shown as well as the calculated F statistic (which equals the left group’s (or Male in this case) variance divided by the right group’s (Female) variance. Note, Excel divides the variances in the order that they were entered into the data entry box, for this example the Males were entered first.

The next two rows are critical for our decision making; but they are incomplete. They show the one-tail critical values used in decision making. The P(F<=f) one-tail is the P- value, or the probability of getting an F-value as large or larger than we have if the null hypothesis is true. However, it is only a one-tail outcome, while we want a two-tail outcome, since we only care about the variances being equal or not, not which one is

larger. So, the result as presented cannot be used directly. What we need to do will be covered after we look at the F.TEST result.

The F.TEST gives less information, but it provides us with exactly what we need; the two-tail p-value. For our data set, we have a 40% (rounded) chance of getting an F value this large or larger purely by chance alone when we are looking at a two-tail outcome. Note that this value, 0.39766 is twice the one-tail value of 0.19883 from the F-Test table. This will always be true, the two-tail probabilities will be twice the one-tail values.

So, if we want to use the F-TEST Two-Sample for Variance tool, we need to double the p-value before making our step 6 decisions.

We are now ready to move on to what step 6 asks for. This step has several parts.

• What is the p-value: 0.3977 (our compa-ratio example result). This value equals EITHER tht F.TEST outcome or 2 times the F-test result. (If, the F-test p-value is in cell K-15, you could enter =2*K15 to get the value desired).

• What is your decision: REJ or Not reject the null? (If our p-value is < (less than) 0.05, we say REJ, if the p-value is > (larger than) 0.05, we say NOT reject. This is what our decision rule says to do.) Our answer is for compa-ratio variances: NOT Rej. This means we do not reject the claim made by the null hypothesis and accept it as the most likely description of the variances within the population.

• Why? This line asks us to explain why we made the decision we did. Our compa- ratio response is: The p-value is > (greater than) 0.05, and the decision rule is reject if the p-value was < 0.05. (The answer here is simply why, based on the reasoning shown above, you chose your REJ or NOT Rej choice.)

• What is your conclusion about the variances in the population for the male and female salaries? This part asks us to translate the statistical decision into a clear answer to the initial question (Are male and female compa-values equal?) Our response: We do not have enough evidence to say that the variances differ in the population. The variances are equal in the population. Had we rejected the null hypothesis, we would have said the population variances differed.

Note that this question did not tell us anything about actual pay differences between the genders. It did tell us that both groups are dispersed in a similar manner, and thus supported some of the conclusions we drew from looking at the data last week.

Examples: Testing for Mean Equality

While we test for variance equality with an F test, we use the T-Test to test for mean equality testing. The t-test also uses the degree of freedom (df) value in providing us with our probability result; but again, Excel does the work for us. The t distribution is a bell-shaped curve that is flatter and a bit more spread out than the normal curve we discussed last week. The center is located at 0 (zero) and the tails (the negative and positive values) are symmetrical.

The t statistic for the testing of two means is basically: (Mean1 – Mean2)/standard error estimate. (The standard error formula varies according to which type of t-test we are performing.) Note that the t will be either positive or negative depending upon which mean is larger. So, if we are interested in simply equal or not equal, it does not matter if we have a positive or negative t value, only the size of the difference matters. As with the F-test, half of our alpha goes in the positive tail and half goes in the negative tail when making our equality decision.

We have two questions about means this week.

Question 2. The second question for this week asks about salary mean equality between males and females. Again, the set up for this question is covered in Lecture 3, we are concerned here with the interpretation of the results. (Note, the comments about each step made above apply to this example as well, but they will not be repeated except for specific information related to the t- test outcome.) Specific differences from the variance example of question 1 will be highlighted with italics. Again, the results discussed with each step are shown in Lecture 2.

Since the question asks if male and female compa-ratios (salaries in the homework), are equal we have a equal versus non-equal hypothesis pair.

Step 1: Ho: Male compa-ratio mean = Female compa-ratio mean

Ha: Male compa-ratio mean =/= Female compa-ratio mean

Step 2: The decision criteria is constant: Alpha = 0.05

Step 3: t statistic and t-test, assuming equal variances. We use these as they are designed to test mean equality, and we are assuming (and according to the F-test have) equal variances.

Step 4: Again, our decision rule is the same: Reject the null hypothesis if the p-value is < alpha = 0.05.

Step 5: Perform the analysis. Here is the screen shot for the results, using the same data as with Question 1.

As with the F-test output, the t-test starts with the test name, the group names, and some descriptive statistics. Line 4 start with a new result, the Pooled Variance; this is a weighted average of the sample variances since we are assuming that the related population variances are equal. The next line, hypothesized Mean Difference, shows up only if a value was entered in the data input box setting up the test (discussed in Lecture 3).

Next comes our friend degrees of freedom, which equal the sum of both sample sizes minus 2 (or N1-1 + N2 -1). The calculated T value (similar to the calculated F value in the F test output) comes next. Note that since we have a negative t Stat, it falls in the left tail of the t distribution. This is important in one tail tests but not in two tail tests. Following the calculated T value come the one and two tail decision points. The one tail p-value is found in the P(T<==t) one-tail row followed by the T critical one-tail value. The two tail results follow.

Step 6: Conclusions and Interpretation. This step has several parts.

What is the p-value? 0.571 (our rounded compa-ratio example result).

(Since we again have a two-tail test, we use the P(T<=t) two-tail result.)

What is your decision: REJ or Not reject the null? NOT Rej (our compa-ratio result)

Why? The p-value (0.571) is > (greater than) 0.05. (The compa-ratio result)

What is your conclusion about the means in the population for the male and female salaries? We do not have enough evidence to say that the means differ in the population. So, our conclusion is that the means are equal in the population. (Our compa-ratio result.)

Question 3

The third question for this week asks about salary differences based on educational level rather than gender. Since education is a legitimate reason to pay someone more, it will be helpful to see if a graduate degree results in a higher average pay. Note that this question has a directional focus (do employees with an advanced degree (degree code = 1) have higher average salaries?). This means we must develop a direction set of hypothesis statements. We will use the terms UnderG (for undergraduate degree code 0) and Grad (for graduate degree code 1) in these statements. Again, the results discussed with each step are shown in Lecture 2.

Step 1: Ho: UnderG mean compa-ratio => Grad mean compa-ratio

Ha: UnderG mean ratio < Grad mean compa-ratio

(Note the way the inequalities are set up; since the question is if degree 1 salary means >, the question becomes the alternate hypothesis as it does not contain an = claim. These can be written with the Grad mean listed first but the arrow heads must point to Grad showing an expectation that grad means are larger.)

Step 2: Alpha = 0.05 (Our constant decision criterion)

Step 3: t statistic and t-test assuming equal variances. We use these as they are designed to test mean equality. (The variance equality assumption is part of the question set-up.)

Step 4: Reject the null hypothesis if the p-value is < alpha = 0.05. (Our constant decision rule.)

Step 5: Perform the analysis. Here is a screen print for a T-test on the question of whether the graduate and undergraduate degree compa-ratios means in the population are equal or not. We are assuming equal variances, we are using the T-Test Two-Sample Assuming Equal Variances form.

t-Test: Two-Sample Assuming Equal Variances UnderG Grad Mean 1.05172 1.07324 Variance 0.00581 0.005999 Observations 25 25 Pooled Variance 0.005904

Hypothesized Mean

Difference 0 df 48 t Stat -0.99016 P(T<=t) one-tail 0.163529 t Critical one-tail 1.677224 P(T<=t) two-tail 0.327059 t Critical two-tail 2.010635

The table output is read exactly the same way as with the question 2 table with the exception that we are interested in the one-tail outcome, so we use the (highlighted) one- tail p-value row in our decision making.

Step 6: Conclusions and Interpretation. This step has several parts.

• What is the p-value? 0.164(rounded) (our rounded compa-ratio example result).

(Since we again have a one-tail test, we use the P(T<=t) one-tail result.)

• Is the t value in the t-distribution tail indicated by the arrow in the Ha claim? Yes. The t-value is negative, and the Ha arrow points to the left (or negative) tail of the t distribution. (Since we only care about a difference in one direction, the result must be consistent with the desired direction. Only large negative values are of interest in this case/set-up, since our difference is calculated by (UnderG – Grad); large negative values show a larger Grad salary. If we had said Grad <= Underg in the Null, the alternate arrow would have pointed to the right or positive tail, and a positive t would have been needed.)

• What is your decision: REJ or Not reject the null? NOT Rej (our compa-ratio result)

• Why? The p-value is > (greater than) 0.05. So, the sign does not matter in this case, but it is in the correct or negative tail. (The compa-ratio result)

• What is your conclusion about the impact of education on average salaries? We do not have enough information to suggest that graduate degree holders have a higher average salary than undergraduate degree holders. (Our compa-ratio result.)

Question 4

While the week 1 salary results suggest that males and females are not paid the same, this week’s compa-ratio tests still do not suggest any inequality. Gender Compa-ratio variances and means are not significantly different. A somewhat surprising result was that graduate degree holders did not have higher compa-ratios.

We still cannot answer our equal pay for equal work question; however, as we have yet developed a measure of pay for equal work. Compa-ratios do remove the impact of grades, but too many other work-related variables still need to be examined.

Summary

The F and t tests are used to determine if, based upon random sample results, the population parameters can reasonably be said to differ. The F-test looks for differences in population variances, while the t-test examines population mean differences. Both tests are performed as part of the hypothesis testing procedure, and always is done in step 5.

Differences in sample results can be transformed into statistical distributions that allow us to determine the probability or likelihood of getting a difference as large or larger than we found. It is this transformation that allows us to make our decisions about the differences we see in the results.

When either test is set-up using the Data | Analysis toolpak function, these tests will provide summary sample descriptive statistics for the mean, variance, and count as well as the calculated statistic, the critical value of the statistic, and the p-value. When set-up using Excel’s Fx or Formula functions, only the p-value is returned.

For both tests, if the appropriate p-value is less than the specified alpha (always 0.05 in this class), we reject the null hypothesis and say the alternate is the more likely description of the population.

We can test for a simple difference (called a two-tail test) where it does not matter which group has the larger value, or we can use a directional test (called a one-tail test) where we are concerned about which variable is larger (or smaller). The null and alternate hypothesis define which difference we are looking for.

The t-test has three versions: equal variances, unequal variances, and paired. The paired test is used when we have two measures on each subject (such as the salary and midpoint for each employee). The F-test is used to help us decide if we need to use the equal or unequal variance form of the t-test.

The Analysis toolpak F test defaults to a one-tail test, so we need to double its p-value when testing for simple variance differences. The Fx (or Formula) F-test lets us select a one- or two-tail outcome.

Please ask your instructor if you have any questions about this material.

When you have finished with this lecture, please respond to Discussion thread Thread 2 for this week with your initial response and responses to others over a couple of days.