Unit 4 psych stat assignment
Hypothesis Testing and the Distribution of Means PSY3200 Unit 4
Defining Hypothesis Testing
In this unit we will be taking the concept of the normal distribution and finally connecting it with an experiment. We will do this in a process known as hypothesis testing. Hypothesis testing is a procedure used to see if a hypothesis supports a particular theory; it uses the normal curve, probability, and sampling (Aron, Coups, & Aron, 2013). The steps for hypothesis testing are as follows: Identify your populations and state your hypotheses; identify the characteristics of the comparison distribution; determine the cutoff score at which you would reject your null hypothesis; calculate your test score; compare your test score to the cutoff score and decide whether you would reject your null hypothesis or not (Aron et al., 2013).
Hypothesis Testing Steps
Step 1 requires us to do 2 parts. The first thing we will need to do for hypothesis testing is to define our populations. To do this we always state (1) who is in this particular group, and (2) what we are measuring (Aron et al., 2013). It is important for this not to state what we think will happen, that is for the hypothesis. Let's take the example hypothesis of "a new technique of studying will increase statistics scores." In this scenario we have two groups, one studying a new way (our experimental group), and one studying an old way (the comparison group). For both groups we are measuring the same thing, their stats scores. So, for the first part of step 1 we will define our populations as:
P1 = Stats scores for those who study the new way
P2 = Stats scores for those who study the old way
The other half of step 1 is to state our hypotheses, and we will have two of them: the research hypothesis and the null hypothesis. The research hypothesis states that there will be a change or a difference (Aron et al., 2013). In this case we are saying that the new way of studying will increase our grades, so our hypothesis will be:
H1 = P1 > P2
This translates to us saying: the stats scores for people who study the new way will be greater than those who study the old way. The null hypothesis states there is no change or difference between the experimental and comparison distributions, in other words, our independent variable will not affect our dependent variable (Aron et al., 2013):
H0 = P1 = P2
This setup may seem unusual, but there is a reason for it. In the media today, we are always hearing things have been proven to be true, whether it is a new product, a new study, a new way of doing something. If you think about it, can you ever really prove something is true? If I say all cars are red and I go out to try and prove that, I could spend 3 years of my life counting red cars; and even if I only found red cars, there is always that possibility that somewhere a blue car exists, I just haven't seen yet. I cannot prove a hypothesis true, but if I did see a blue car, this would disprove my hypothesis. I cannot prove a hypothesis, but I can disprove one by finding contradictory evidence. However, disproving a research hypothesis is not an efficient way to do research, so this is where the rationale of the null hypothesis comes in. The null hypothesis states the experiment does not work (the IV does not affect my DV), so we try to disprove that null hypothesis and reject it. If we can, it seems the research supports my hypothesis (Aron et al., 2013). If we cannot, then I must accept the null hypothesis and the fact the experiment did not work.
Hypothesis Testing and the Distribution of Means https://www.softchalkcloud.com/lesson/files/I6GkcsD5hvEgzu/less...
1 of 4 7/1/20, 10:29 AM
Step 2 is to identify the characteristics of the comparison distribution (Aron et al., 2013). In order to see if the experiment worked, we must compare it to those who did not receive the manipulation to check if there is a difference. If I was doing a Z score: (X-M)/SD, I would find the X, the M, and the SD in this step. Any piece I'm missing I would calculate. For our example we would calculate the M and SD of people who study the old way and the X is the stat score for the person studying the new way.
Step 3 is to find the cutoff score for the comparison distribution. It is in this step where we ask: at what point is our score different enough from the comparison distribution to say it was caused by our IV (Aron et al., 2013). Let's say the average score on our stats test was a 75. If you study the new way and get a 76, it is clearly higher, but would you say it is different enough that it's because of our new way of studying? Most likely you'd say that's not enough of a difference. If you get a 97 however, then you might be more inclined to say it was because of the experiment. The question step 3 addresses is: at what point do we say it is enough? Because we are dealing with a sample we know we can never be 100% certain, but if we can say we are 95% certain we have the right results, and there is only a 5% chance the results we have are wrong, then that we can accept. The key to step 3 is to find the Z scores that will give us that answer. Luckily the problems we did in the last unit will help us out. If we look at the normal curve distribution, we know the mean is right in the middle (with a Z score of 0), 5% chance of error is out in the tail, so we just have to figure out what Z score will give us 5% in the tail. That Z score is 1.64; we call this the 0.05 significance level or α = 0.05. If we wanted to be more confident in our answer, we could go for 99% accurate, or a 1% chance the results we got were wrong. In this case we'd need to reach a Z score of 2.33.
In both scenarios we had a positive Z score because we expected the scores to increase or go up. This is called a directional hypothesis. In a directional hypothesis the researcher clearly indicated what direction she thought the experiment would go in (Aron et al., 2013). In these types of problems, you would see key words like increase, rise, or decrease, lower. There are also non-directional hypotheses in which the researcher indicates something will change but does not specify which direction it will occur (Aron et al., 2013). These would use key words such as effect, change, or difference. In these scenarios, you would need two other Z cutoff scores: +/-1.96 for the 0.05 significance level and +/-2.57 for the 0.01 significance level. Why do we need these two different cutoff scores? We said using 1.64 allowed for just a 5% chance that the results we got were due to chance, but if we were saying our hypothesis could go in either direction from the mean, then we would need acknowledge both tails of the curve with that 5%. We take that 5% and split it evenly in both tails of the distribution, so the top and bottom 2.5%. The same setup would be used for the 0.01 significance level.
In the end we are left with 4 cutoff scores for Z scores: For the α = 0.05 and a 1-tailed test: 1.64 or -1.64; for the α = 0.01 and a 1-tailed test: 2.33 or -2.33; for the α = 0.05 and a 2-tailed test: +/-1.96; and for the α = 0.01 and a 2-tailed test: +/-2.57.
Step 4 is to solve the final equation. In this case we are solving for Z so we would follow the steps for solving for Z = (X-M)/SD.
Step 5 we compare the Z scores from step 3 (the cutoff) to step 4 (your Z score) and we decide if we would reject or fail to reject the null hypothesis. The rule is, if our score is further from the mean we would reject the null, if it is not, we would fail to reject the null (or accept the null) (Aron et al., 2013). For example, if our Z score was 2, and the cutoff Z score was +/-1.96, we would reject because it is further from the mean (which has a Z score of 0). If we had a Z score of 1 and the same cutoff score (+/-1.96), then we would fail to reject. Some students do get tripped up on the wording: reject sounds bad so many students think it means it's wrong, but you have to remember what you are rejecting: the null hypothesis, the one that says it didn't work. In order to help understand steps 3, 4, and 5 of hypothesis testing, a set of practice steps were added to an accompanying PowerPoint with answers.
Distribution of Means
In our example we used the Z score formula of (X-M)/SD. This formula is used when you are comparing a single
Hypothesis Testing and the Distribution of Means https://www.softchalkcloud.com/lesson/files/I6GkcsD5hvEgzu/less...
2 of 4 7/1/20, 10:29 AM
score to a sample, but we are almost never using just a single person for an experiment. We often compare a sample to a population. We would take a group of people from a population, apply a manipulation and see how they compare to the population they came from (Aron et al., 2013). This would update our Z score formula to: Z = (M-µ)/σ. The mean of the sample minus the mean of the population, divided by the standard deviation of the population. The problem with this is we would never actually know the mean of the population (it is impractical to be able to get that).
To solve these problems, we will come up with a mean that will represent the population; and how we do that is by repeatedly sampling the population (Aron et al., 2013). A single group to represent the population isn't ideal as we could have a group of outliers, so by getting multiple samples, we have a better chance of being a good representation (Aron et al., 2013). We get a sample of 30 people from the population and get their mean; if do this 20 times and get 20 samples (each time we calculate their mean), we now have a set of scores made up of means: a distribution of means. This distribution is a good substitute for the population because it has enough numbers we can comfortably feel it represents the population. The more people from the population, the more like the population our distribution will be (Aron et al., 2013). We will use this distribution as a comparison distribution, and because we are saying the distribution is a representation of the population then the mean of that population is equal to the mean of that distribution of means; so: µ=µm.
The same cannot be said about the variance however. The distribution of means is made up of numbers that are means or averages, and if you recall the definition of mean was a balance point for the distribution. The variance, which describes how the scores are deviating from the mean would be lower because it would be made up of less extreme scores (since all the scores are means) (Aron et al., 2013). To calculate the variance of the distribution of means we take the variance of the population and divide by the sample size: σ2m = (σ2 / n). The standard deviation of the distribution of means (called the standard error) would be the square root of that: σm = (σ/√n). The standard deviation of the population divided by the square root of the sample size.
The rest of the steps of hypothesis testing for the distribution of means is the same, except we use an updated formula for Z scores: (M-µm)/σm. To review how we would do a hypothesis testing test for the distribution of means: (1) define your populations and research and null hypotheses; (2) define the comparison distribution by calculating µm and σm; (3) determine your cutoff score in which you would reject your null hypothesis; (4) calculate your Z score; (5) decide whether to reject your null hypothesis.
Power
The final part of this unit will briefly discuss the concept of power. Power is the probability you will reject your null hypothesis when you are supposed to (Aron et al., 2013). We know at the end of a hypothesis testing problem we have 2 choices: reject or fail to reject your null hypothesis. These choices have nothing to do however with the reality of what actually happened. It is entirely possible you did everything correct in your experiment but still got the wrong answer. If your sample told you the scores were different you might reject your null hypothesis even though the sample was entirely incorrect. We could make 2 possible errors: type I error and type II error. Type I error is rejecting the null hypothesis when we are not supposed to; in other words, we said the experiment worked when it did not (Aron et al., 2013). This may sound familiar, and that is because this is the basis of the α levels. α is the probability of making a type I error. Type II error is failing to reject the null hypothesis when you were supposed to; or the experiment did work but you said it did not (Aron et al., 2013). The problem in research is one never knows if they actually made a type I or type II error. The key is to try and limit the possibilities of errors in data such as confounding variables. The other key to avoid this is increase effect size. Effect is the amount 2 distributions do not overlap each other (Aron et al., 2013). The less they overlap, the more different they are, which increases power.
(CSLO 1, CSLO 2, CSLO 5, CSLO 7, CSLO 9, CSLO 10, CSLO 11)
References
Hypothesis Testing and the Distribution of Means https://www.softchalkcloud.com/lesson/files/I6GkcsD5hvEgzu/less...
3 of 4 7/1/20, 10:29 AM
Aron, A. Coups, E.J. & Aron, E. (2013) Statistics for Psychology (6th ed.) Chapter 3.
Hypothesis Testing and the Distribution of Means https://www.softchalkcloud.com/lesson/files/I6GkcsD5hvEgzu/less...
4 of 4 7/1/20, 10:29 AM