PSY302
Chapter 2 Lecture Notes
Slide 2:
· Central tendency is a very important term to understand in statistics. It refers to the middle of a group of scores (a distribution), and can do so with a single number. That single number can describe or summarize a whole dataset.
· Central tendency refers to the most typical score that can represent the distribution as a whole. Three examples are: mode, median, and mean. Each have their advantages and disadvantages in describing a dataset, but importantly, each is just one number that can give you some information about a group of scores.
Slide 3:
· Mode is the most frequently occurring score in a group of scores (a distribution). The mode can be easily determined from a frequency table, as the mode is the value with the highest frequency (or most responses). Importantly, the mode is not the frequency itself, but the value associated with the highest frequency. For example, if 8 people scored an 84 on an exam, the mode is not 8, it is 84.
· Mode is typically the least informative kind of central tendency for numeric data because you can change some scores in a distribution without affecting the mode at all. It doesn’t really tell us all that much about the distribution as a whole. However, mode is an informative way to describe the central tendency of nominal (or categorical) variables because they cannot be described using mean or median, therefore, knowing the mode (or the most frequent) response to “favorite ice cream flavors” is informative to us as researchers.
Slide 4:
· Median is the middle score when all scores from a dataset are organized from lowest to highest. The median is often relied on when a distribution is skewed in some way because a skewed distribution affects the mean much more than it does the median. Therefore, the median is better representation of the middle of a distribution whenever it is skewed.
· You can determine the median by first lining up all of your scores from lowest to highest. Then calculate the median position (MP), which equals (N + 1) / 2. Lastly, whatever the MP is, count that many scores either from the left or the right, and that is your median. If the MP is a round number, whatever score you land on will be your median. But if the MP is a .5 number, then you will have to find the average of the middle two numbers to find the median.
Slide 5:
· Example 1: MP equals (5 + 1) / 2 = 3, so the median is 13.
· Example 2: MP equals (4 + 1) / 2 = 2.5, so the median is in between the two middle scores (12 and 13). Find the average of those two scores, so the median is 12.5.
Slide 6:
· The mean is the most common and, in most cases, informative, measure of central tendency. It is just the average of all scores in a distribution. To calculate it, sum up all of the scores and divide it by the total number of scores in the distribution. Refer to the formula on this slide. The symbol, Σ, means the “sum of” and is called “sigma”. X refers to each individual score in a distribution. And N is the total number of scores you have in a dataset that make up a distribution. So the formula reads, the sum of all scores divided by the total number of scores.
Slide 7:
· All scores in a distribution are included in the calculation of the mean, which makes it very representative of the data. Whenever researchers conduct a study, we have to understand that our participants are going to respond differently, in fact, that is a good thing in order to construct a normal distribution. Yes, many of the responses will be in the middle, but it is also good to fill out the tails of the distribution also. Because there are always many different responses, the mean is a strong way to represent and summarize these responses with just one number.
· However, the mean is not without disadvantages. The mean is susceptible to extreme scores, called outliers, which is a score that is very, very different than the majority of scores. An outlier is a score that is so extreme that it is barely represented on a normal distribution. Outliers can significantly change the mean of a distribution because again, to calculate the mean, you must use all scores in a distribution.
· For example, calculate the mean of these five scores: 1, 2, 3, 4, 5. M = 3. But now let’s replace one of the scores with an outlier. Now calculate the mean of these five scores: 1, 2, 3, 4, 20. M = 6. The mean doubled based on one score, the outlier, and now does not represent the dataset as well as it should.
Slide 8:
· We can compare how the mean, median, and mode are associated with each other in a distribution. In a normal distribution, the mean, median, and mode are all the same and are all right in the middle of the distribution. However, these three measure of central tendency can change position on a distribution depending on the distribution’s shape. When a distribution is skewed, the mean is always pulled towards the tail of the distribution because those extreme positive or negative scores are significantly influencing the mean. In a positively skewed distribution, the mean is pulled to the right (toward the more positive scores), causing the median to be to the left of the mean. If you are thinking of the x-axis as a number line, the median is lower (and smaller) than the mean on the distribution. In a negatively skewed distribution, the mean is pulled to the left (toward the more negatively scores), causing the median to be to the right of the mean. Again, thinking of the x-axis as a number line, the median is higher (and larger) than the mean on the distribution. The mode is not affected by outliers at all, and therefore always lands at the peak of the distribution.
Slide 9:
· (a) is a negatively skewed distribution. As you can see the mean is furthest to the left, then the median, and then the mode (going left to right). (b) is a positively skewed distribution. The opposite of (a) is true here. And lastly, (c) is a normal distribution, which represents the mean, median, and mode all in the exact middle of the distribution. In this case, all of these measures of central tendency are the same.
Slide 10:
· Now that we have discussed central tendency, in other words, the middle of the distribution, we have to also talk about the variability that a distribution has. Researchers not only want to know about the middle of a distribution, but also about the spread of a distribution. The questions of variability is “how much do the scores in a distribution vary from each other?” There are two types of variability: variance and standard deviation.
· Variance is the average of the squared deviation scores from the mean. In other words, how much does each score in a distribution vary based on the mean. Two important terms you will need to understand in order to calculate the variance are deviation scores and the sum of squares (SS).
Slide 11:
· Here is a depiction of variance. In the top panel, you see a normal distribution, with not much variance. This distribution illustrates a neighborhood where most of the houses are around the same average price. However, in the bottom panel, there is a lot of variance. This distribution illustrates a neighborhood where the house are of all different prices, therefore the distribution is much more spread out, indicating more variance.
Slide 12:
· See steps and formula for computing variance on this slide. As you will see in the lab lecture video this week, you will need to create a three column table anytime you need to find variance (this table is crucial and we will be using it for the whole semester). The objective when making this three column table is to find the sum of squares (SS), which is equal to Σ(X-M), which you will need to find variance. Watch the lab lecture video for a tutorial on how to make this critical three column table.
· The symbol for variance (for now) is SD2, again Σ is the “sum of”, X is an individual score, M is the mean, and N is the total number of scores. It is incredibly important to know (memorize) these symbols (and there are more to come in later chapters).
· Variance is just one measure of variability, but it is not often relied upon because it is based on the squared deviation scores, which can sometimes misrepresent how spread out a distribution really is. The next measure of variability is more often used, that is, standard deviation.
Slide 13:
· Standard deviation is the average amount that scores differ from the mean. Standard deviation tells you, in a standardized way, how much above or below the mean a particular score is. This is important to identify outliers, as some researchers set a cutoff point for what makes a score an outlier. Typically, if a score is 3 standard deviations above or below the mean, the score would be considered an outlier because it is so different from the mean.
Slide 14:
· The symbol for standard deviation (for now) is SD, and it is simply calculated by taking the square root of the variance (SD2). IMPORTANT: if you know variance, you can simply take the square root of it to find standard deviation, and likewise, if you know standard deviation, you can simply square it to find variance. If you have one of these measures, you can always find the other.
· Standard deviation will be crucial to us throughout the semester and we will be labeling distributions using standard deviation.
Important Reading
· Pgs. 34-49; 56
· Link to lab lecture (example problems): https://youtu.be/veOb_ZLtNuE