H.Chapter2Salkind_acc.pptx

Chapter Two (Salkind)

Means To An End

Computing and Understanding Averages

Welcome!

I hope you enjoyed our detour into Smith and Davis, with our focus on research ideas, research questions, variables and control, and cultural issues in research methods. We will return to Smith and Davis next semester, so you can put away that book for now

Now we get back to statistics using the Salkind textbook! Are you ready! (No worries – you might actually find this enjoyable!)

In this chapter, we will discuss our first mechanism for really understanding the organization of our data: descriptive statistics

2

Descriptive vs Inferential Statistics

Recall Descriptive Statistics from Salkind’s Chapter 1

Descriptive statistics organize and describe the characteristics of a collection of data (or data sets)

These statistics are very “descriptive” in nature

Contrast them with inferential statistics, where we try to make inferences from a small group (a sample) to a larger group (a population)

Part of the descriptive nature of statistics involves assessing the very nature of the data set in hand. This might involve searching for a single number around which the data “pool” together. Yep, I am talking about an “average” score!

3

Averages

Our first descriptive statistic task is figuring out a single value that best represents and describes an entire group of scores

Consider an average score. The average on our last quiz might be 95% or it might be 65%. Which score is better?

If you focus on 95%, this number can tell you a lot about the intelligence of the class, the simplicity of the exam, or the great teaching ability of the instructor!

If you focus on 65%, that number might tell you about the class’s lack of intelligence or the toughness of the test (the teaching ability of the instructor is always great!)

4

But an average isn’t the only number that can clue us in to the nature of the data set. In this chapter, we will look at several different “Measures of Central Tendency”

The mean

The median

The mode

Measures of Central Tendency

5

An Overview of This Chapter

In this chapter we cover the following items …

Part One: Scales of Measurement

Part Two: Computing the Mean

Part Three: Computing the Median

Part Four: Computing the Mode

Part Five: When to Use What

Part Six: Using the Computer and Computing Descriptives

Part Seven: An Eye Toward The Future

6

Part One

Scales of Measurement

Scales of Measurement – NOIR

You won’t find much information about scales of measurement in your Salkind textbook (there is a nice chapter in Smith and Davis Chapter 9 that we will cover in Research Methods and Design II next semester), but it is important to bring up at least a little right now, as such scales will come in hand this chapter

In a nutshell, specific statistical tests are based on specific kinds of variables. Some tests, like a t-Test or ANOVA, require that our dependent variable be assessed on a continuous scale (ratio or interval). Other tests (like a chi square) can use DVs that are nominal or ordinal.

Scales of Measurement

NOIR

Scales of Measurement – NOIR

Nominal

A nominal scale is categorical in nature

Ordinal

An ordinal scale includes rankings

Interval

An interval scale includes rankings and a continuous scale

Ratio

A ratio include rankings, a scale, and it has a zero point

Nominal

Nominal

A nominal scale is categorical in nature

Think about a yes / no answer option here, or a multiple choice response. You have different response options, but one is not necessarily better or worse than another

For example, if I asked people their ethnicity, they could say Caucasian, Hispanic, Asian, etc. For this variable, it doesn’t make sense to say one is worse than the other. Rather, they are merely different in terms of a category

Ordinal

Ordinal

An ordinal scale includes rankings

Think about a rank order here, like first place, second place, and third place in a race

Although the order (for ordinal) is set (1st, 2nd, 3rd), 1st and 2nd place might be really close (cross the finish line within seconds) the 3rd place is far away (two minutes)

Here, ALL you know is the order. The distances between the listed items might vary

Interval

Interval

An interval scale includes rankings and a continuous scale

Interval variables include a ranking plus they have set distances between items.

Think about a scale asking, “How frustrated are you?”

Here, you have a ranking (1 is lower than 2, etc.) AND it has set distances (Same distance between 1 and 2 as 4 and 5)

1 2 3 4 5
Not at all Somewhat Very

12

Ratio

Ratio

A ratio include rankings, a scale, and a zero point

This is a lot like a ratio scale, but here it has a zero point

For example, if I test you on 20 questions, the worst you can get is 0 out of 20. You can’t get lower than that!

Coming up in Methods and Design II

We will delve more into the four NOIR scales next semester in Research Methods and Design II, but I wanted to introduce the idea here, as the mean, median, and mode often rely on which scale of measurement the researcher uses.

Part Two

Computing the Mean

Computing The Mean

Measures of Central Tendency – The Mean

The mean is the average of all scores in a distribution

This value is dependent on each score in a distribution

It is the most widely used and informative measure of central tendency.

Okay, are you ready for our first formula for the semester? Yay!

Measures of Central Tendency – The Mean

The mean is the average of all scores in a distribution

The mean is the most widely used and informative measure of center, and is often expressed as:

= Often called the “X bar” (this is “the mean”)

∑ = The Greek letter “Sigma” (the sum of numbers)

χ = The individual scores in the data set

n = The size of the sample from which you compute the mean

The Mean Formula

Example: Mean

Suppose we look at the conviction rates of spousal abusers arrested by Miami police officers. We have a sample of ten police officers who arrest abusers over a one year period.

Using data from ten police officers (Subjects 1 through 10, or S1 through S10), we can examine the “average” number of convictions across all ten officers over this period …

The Mean for the # Convictions = 13.5

135

# Convictions
S7 65 +
S5 24 +
S3 12 +
S4 12 +
S9 8 +
S2 6 +
S8 4 +
S10 2 +
S1 1 +
S6 1 =

Example Mean Calculation

Woohoo!

Measures of Central Tendency – The Mean

Yea! We just finished our first statistical formula. East, right!

Of course, there are a few things to remember …

The Mean: and M

Measures of Central Tendency – The Mean

Things to Remember

In the prior formula for the mean, the mean was represented by the letter . Sometimes, though, you will see the letter M

For example, in the sentence, “The participants found less evidence of injury in the car crash case (M = 5.67) than the bus crash case (M = 8.98)”, you know the average score for the car crash case is 5.67 compared to the average 8.98 score in the bus crash case.

In mathematical formulas, a small n usually represents a sample size while a capital N represents the population.

Sample

Population

N vs n in Mean Formulas

Arithmetic Mean

Measures of Central Tendency – The Mean

Things to Remember

The sample mean often reflects the population mean (but not always – a representative sample always helps!)

The mean is often the central number in a data set, since subtracting all deviations of the mean from the mean will result in zero (this is the arithmetic mean)

Arithmetic Mean Example

Imagine a mean of 4: (3 + 4 + 5 = 12/3 = 4).

Now, find the difference between each score and the mean (3 – 4 = -1 4 – 4 = 0 5 – 4 = 1)

Now add up the deviations from each mean

-1 + 0 + 1 = zero!

In other words, some scores in the dataset are above the mean, some are below and if you add up those differences you’ll get zero!

The mean is sensitive to outliers!

Just remember that the mean is sensitive to extreme scores (high and low)

Consider our police officer data again …

# Convictions
S7 65 +
S5 24 +
S3 12 +
S4 12 +
S9 8 +
S2 6 +
S8 4 +
S10 2 +
S1 1 +
S6 1 =

Mean for the # Convictions = 13.5

Note that only 2 out of the 10 officers are above the mean

135

S7 and S5 are above the mean

# Convictions
S3 12 +
S4 12 +
S9 8 +
S2 6 +
S8 4 +
S10 2 +
S1 1 +
S6 1 =

Now the Mean = 5.5

Our mean score drops from 13.5 to just 5.5 if we get rid of those two outliers (the really high scores).

The mean is impacted by EVERY score in the data set

46

What if we remove them?

What is the mean for this group of scores?

A). 96.76

B). 101.12

C). 106.93

D). 107.28

E). 111.13

IQ
89
92
103
104
108
121
134

Pop-Quiz 1: Quiz Yourself

IQ
89
92
103
104
108
121
134

What is the mean for this group of scores?

A). 96.76

B). 101.12

C). 106.93

D). 107.29

E). 111.13

Answer 1: D

Pop-Quiz 2: Quiz Yourself

What does this symbol … ∑ … mean?

A). Delta – the sum total

B). Alpha – the average

C). Sigma – the sum total

D). Eta – the average

E). Mu – the sum total

Answer 2: C

What does this symbol … ∑ … mean?

A). Delta – the sum total

B). Alpha – the average

C). Sigma – the sum total

D). Eta – the average

E). Mu – the sum total

Yay!

Congratulations, you got through your first statistical formula in this course! Not too bad, huh!

Weighted Mean

Computing a Weighted Mean

Now, sometimes listing ALL means can be a huge chore. A weighted mean analysis is much easier to calculate.

Here, multiply the value of a score by the frequency of that score’s occurrence. Then add the total and divide by the total number of occurrences. How about an example?

Imagine Exam grades (out of 20). Nine students got 20, eight got 19, five got 18, seven got 17, fourteen got 16, two got 15, one got 14, and two got 12

Etc.! The total is 833, so M = 17.35

Now we could use the standard mean formula, but it is a lot of scores to compute by hand!

9 students (20): 20 + 20 + 20 + 20 + 20 + 20 + 20 + 20 + 20

8 students (19): 19 + 19 + 19 + 19 + 19 + 19 + 19 + 19

5 students (19): 18 + 18 + 18 + 18 + 18

7 students (17): 17 + 17 + 17 + 17 + 17 + 17 + 17

14 students (16): 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16

Weighted Mean Example

Value (out of 20) Frequency Count Value X Frequency
20 9 (there are 9 students with scores of 20) 180 (20 x 9 = 180)
19 8 152
18 5 90
17 7 119
16 14 224
15 2 30
14 1 14
12 2 24
Total

Computing The Weighted Mean

Computing The Weighted Mean (2)

Value (out of 20) Frequency Count Value X Frequency
20 9 180
19 8 152
18 5 90
17 7 119
16 14 224
15 2 30
14 1 14
12 2 24
Total (Total # of Students) 48 833

Reporting the Mean

When reporting means, use Roman letters (like or M) for a sample and Greek letters (like µ) for populations.

“Children with high creativity scores ( = 25) were chosen from among their peers (µ = 176) to participate in the study.”

“Children with high creativity scores ( = 25) were chosen from among their peers (µ = 176) to participate in the study.”

or

Which symbol is most appropriate for the sample mean?

A). X

B).

C). ∑

D). µ

Pop-Quiz 3: Quiz Yourself

Answer 3: B

Which symbol is most appropriate for the sample mean?

A). X

B).

C). ∑

D). µ

M would also be acceptable!

Part Three

Computing the Median

The Median

Measures of Central Tendency – The Median

The median is the central score in an ordered distribution (or the middle score, with half above and half below)

Unlike the mean, the median is relatively insensitive to outliers (extreme high or low scores don’t affect it much)

Half the scores fall above the median; half fall below

It is best used when …

data are “ordinal” (that is, they are ranked) OR

If the data is interval but it does not meet the statistical requirements needed for the mean (outliers are present)

How to Compute the Median

Steps in calculating the median

1). List all values IN ORDER (either highest to lowest or lowest to highest is fine, but it must be in order)

2). Find the middle-most score. That’s the median!

Consider our officer data again. Let’s say that we have both arrest rates for each officer (over a twelve month period) as well as the conviction rates for the spousal assaulters the officers arrested …

The Median

Spousal assault cases over twelve months for ten police officers who responded to the calls

What is the Median for the # Arrests

Ok, it may be easier to reorder the values first!

Let’s look at arrests

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 12
S4 62 12
S5 26 24
S6 26 1
S7 84 65
S8 5 4
S9 26 8
S10 8 2

Example Median Data

# Arrests
S7 84
S4 62
S3 48
S5 26
S6 26
S9 26
S2 9
S10 8
S1 5
S8 5

Arranged from high to low

What is the Median for the # Arrests = 26

Half the scores fall above; half below

Example Median Calculation

{

# Arrests
S7 84
S4 62
S3 48
S5 26
S6 26
S9 12
S2 9
S10 8
S1 5
S8 5

Now what is the Median for the # Arrests?

Add middle numbers

26 + 12 = 38

Divide by two

38 / 2 = 19

Another Median Calculation

# Arrests for 11 officers
S7 84
S4 62
S3 48
S5 26
S6 26
S9 12
S2 9
S10 8
S1 5
S8 5
S11 4

The Median

Now, given an 11th officer (S11), what is the Median for the # Arrests?

Median for the # Arrests = 12

Another Median Calculation with 11 Officers

Percentile

The median can utilize percentile points (the percentage of cases equal to and below a certain point in a distribution)

A score in the 95th percentile means that the score is at or above 95 percent of all scores in the distribution

The median is at the 50th percentile (often called the Q2).

Q1 is the 25th percentile while Q3 is the 75th percentile

The Median is Not Sensitive to Outliers

Unlike the mean, the median is not sensitive to outliers. This is important, as the median is less affected by really high or really low numbers to the same extent as the mean.

Recall our officer arrest rates

What is the mean and what is the median in the following data set? …

# Arrests
S7 84
S4 62
S3 48
S5 26
S6 26
S9 26
S2 9
S10 8
S1 5
S8 5

What was the Median for the # Arrests again? 26

What is the Mean?

299 / 10 = 29.9

The mean is larger than the median, probably because it is being pulled higher by that large 84 for subject #7.

Remove subject #7 and the mean becomes 23.88 while the median remains 26!

Mean vs. Median

More Examples

Let’s see how this affects a score with even greater outliers in our officer conviction rates.

Again, consider our table …

# Convictions
S7 65
S5 24
S3 12
S4 12
S9 8
S2 6
S8 4
S10 2
S1 1
S6 1

Median for the # Convictions = 7

But is 7 a good middle point for data that ranges from 1 to 65?

Median’s do not take into account outliers!

More Median Examp.

In this example, the range of arrests differs but the medians are the same

Median is 14 for both

Mean Arrests 1 =13.3

Mean Arrests 2 = 25

The mean for arrests 2 takes into account those three big numbers for S8, S9, and S10

# Arrests 1 # Arrests 2
S1 1 3
S2 4 4
S3 6 4
S4 9 4
S5 14 14
S6 14 14
S7 18 27
S8 19 47
S9 22 59
S10 26 82

Mean vs. Median (again)

Reporting the Median

Always keep in mind that outliers distort, or skew, the central point of a data set. This impacts the mean, but not the median

In journal articles, you might see the median expressed as “Med” or “Mdn”.

Pop-Quiz 4: Quiz Yourself

What is the median for this group of scores

A). 92

B). 103

C). 104

D). 108

E). 121

IQ
89
92
103
104
108
121
134

Answer 4: C

What is the median for this group of scores

A). 92

B). 103

C). 104

D). 108

E). 121

IQ
89
92
103
104
108
121
134

Pop-Quiz 5: Quiz Yourself

What is the median for this group of scores

A). 92

B). 103

C). 104

D). 106

E). 121

IQ
89
92
103
104
108
121
134
143

Answer 5: D

What is the median for this group of scores

A). 92

B). 103

C). 104

D). 106

E). 121

(104 + 108) = (212 / 2) = 106

IQ
89
92
103
104
108
121
134
143

Part Four

Computing the Mode

The Mode

The mode is the most frequent (commonly occurring) score in a distribution. As such, it is the simplest measure of center to calculate, but also the least precise

Scores other than the most frequent are not considered

Neglects the magnitude of scores in the distribution

Most often associated with nominal scales

Computing The Mode

Steps for computing the mode

1. List all distribution values (but list each value only once)

2. Tally the number of occurrences for each value

3. The value that occurs most frequently is the mode!

Rather than a numeric “score”, the mode is based more on category membership. Just because an item is in Category A doesn’t mean it is necessarily better than a Category B item. That is, we use a nominal scale of measurement

The Mode is Good for Nominal Data

Examples of Nominal Data:

Most purchased food on a menu

Pizza, hamburgers, salad, fish, meat loaf, etc.

Number of men and women in our class

Males, females

Vehicles on the road

Trucks, SUV’s, compacts, sedans, minivans, etc.

Genres of movies

Horror, Comedy, Action, Romance

What is the Mode for the # Arrests: 26

What is the Mode for # Convictions? 11

Okay, one more …

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 11
S4 62 11
S5 26 24
S6 28 9
S7 84 65
S8 5 4
S9 26 8
S10 8 2

Example Mode Calculation

Pop-Quiz 6: Quiz Yourself

What is the mode in this data set

A). 656

B). 456

C). 405

D). 399

E). None of these

Year in College Number or Frequency
Freshman 656
Sophomore 456
Junior 405
Senior 399

Answer 6: E

What is the mode in this data set

A). 656

B). 456

C). 405

D). 399

E). None of these

If you said “656”, you are … WRONG! The mode is Freshman. There are more Freshman than all other categories

Year in College Number or Frequency
Freshman 656
Sophomore 456
Junior 405
Senior 399

Categories

Category membership is very important here, so make sure to focus on the label of a category rather than the number of times a category occurs.

Yet sometimes it is easy to get confused when the category label is a number itself. Our officer data is like this. The most frequent arrest category might be 11 arrests (that category occurs most frequently), but the number 11 is actually more important as a category rather than as a number. Suppose I gave out colors rather than numbers for the officer data …

What is the Mode for the # Arrests?

What is the mode for convictions?

# Arrests # Convictions
S1 red pink
S2 green black
S3 brown red
S4 blue red
S5 purple maroon
S6 purple yellow
S7 beige grey
S8 maroon orange
S9 yellow aqua
S10 orange white

Another Example Mode Calculation

What is the Mode for the # Arrests? Purple!

What is the mode for convictions? Red!

The frequency of CATEGORIES matter, not necessarily numbers

Of course, red and purple arrests are odd to think about, so let’s go back to numbers again.

# Arrests # Convictions
S1 red pink
S2 green black
S3 brown red
S4 blue red
S5 purple maroon
S6 purple yellow
S7 beige grey
S8 maroon orange
S9 yellow aqua
S10 orange white

Another Example Mode Calculation: Answer

More than one mode?

When you find the mode, sometimes you might come across multiple modes. Consider our officer “number” data …

What is the Mode for the # Arrests?

What is the Mode for the # Convictions?

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 12
S4 62 12
S5 26 24
S6 26 9
S7 84 65
S8 5 4
S9 26 8
S10 8 2

Okay, now you practice! What’s the Mode?

What is the Mode for the # Arrests = 26

What is the Mode for the # Convictions = 12

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 12
S4 62 12
S5 26 24
S6 26 9
S7 84 65
S8 5 4
S9 26 8
S10 8 2

Did you get it?

Now what is the Mode for the # Convictions?

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 12
S4 62 12
S5 26 24
S6 26 9
S7 84 65
S8 5 8
S9 26 8
S10 8 2

Okay, now try this one:

Now what is the Mode for the # Convictions

12 / 8 Bimodal now!

Yes, we can have data sets with multiple modes. Even our officer color data might be bimodal …

# Arrests # Convictions
S1 5 1
S2 9 6
S3 48 12
S4 62 12
S5 26 24
S6 26 9
S7 84 65
S8 5 8
S9 26 8
S10 8 2

Uh oh! Two Modes?

What is the Mode for the # Arrests? Purple and Green

What is the mode for convictions? Black and Red

# Arrests # Convictions
S1 red black
S2 green black
S3 green red
S4 blue red
S5 purple maroon
S6 purple yellow
S7 beige grey
S8 maroon orange
S9 yellow aqua
S10 orange white

Bimodal Datasets

Things to Remember about The Mode

Just remember that categories must be mutually exclusive

That is, items cannot belong to more than one category at a time when you focus on the mode

You cannot be both a sophomore and a junior

You cannot be both black and red

You cannot be both male and female (though you could add another category to include transgender people)

Part Five

When To Use What

When To Use What?

When should you use the mean, median, or mode to describe your data? Which one is the best?

It depends on how the variable is measured (Remember NOIR?)

When is it best to use the mode?

If you use a categorical, qualitative, or nominal variable, you must use the mode

For Example, let’s say we ask children to name their favorite color. Ten say red, three say blue, six say green.

So what is the mean color preference?

That question makes no sense, right? You could add ten and three and six to get nineteen students, and divide by … what? Is the mean color somewhere between red and blue?

When you deal with categories, use the mode!

When is it best to use the median?

If you are using data that is quantitative in nature (there is a high to low or low to high ranking of data), either the median or the mean might be best

The median is best used when there is an extreme score or outlier.

Remember, the median is less sensitive to outliers than the mean

The mean is best when there are no outliers, as it is more precise than the median

If you have a nice, normal curve (not too many high or low scores), use the mean!

When is it best to use the mean?

The mean is often thought of as an economic measurement tool while the median is a social measurement tool. Think about buying and selling real-estate

The mean might be used to describe the average value of a portfolio of houses being offered for sale by a particular real estate agent (they want to include high end home values)

However, someone wanting to buy a home from a real estate agent might want to use the median or middle house value. This is because the median does not alter when there are extreme values (outliers) in a data set.

Example: When To Use What?

To illustrate, imagine two real estate agents have the following sales (30 = 30,000)

Agent A: 30 40 50 60 70. Mean = 50 and median = 50

Agent B: 30 40 50 60 700. Mean = 176 and median = 50

The value 700 is an outlier for Agent B

Here, the mean is not as helpful as the median.

As a home buyer, you may be leery of seeking the help of an agent with a mean of $167,000, but not $50,000

Yet as a home seller, your realtor might list their mean sales of $167,000 (rather than the median $50,000), as using the mean makes them look really effective!

Pop-Quiz 7: Quiz Yourself

Last year, a fast food outlet in a beachside city paid 3 kitchen hands $16,000 per year each, 2 supervisors $22,000 each, and the owner $85,000. What is the mode?

A). $29,500

B). $19,000

C). $16,000

D). $14,500

E). $12,000

Answer 7: C

Last year, a fast food outlet in a beachside city paid 3 kitchen hands $16,000 per year each, 2 supervisors $22,000 each, and the owner $85,000. What is the mode?

A). $29,500

B). $19,000

C). $16,000

D). $14,500

E). $12,000

Pop-Quiz 8: Quiz Yourself

Last year, a fast food outlet in a beachside city paid 3 kitchen hands $16,000 per year each, 2 supervisors $22,000 each, and the owner $85,000. What is the median?

A). $29,500

B). $19,000

C). $16,000

D). $14,500

E). $12,000

Answer 8: B

Last year, a fast food outlet in a beachside city paid 3 kitchen hands $16,000 per year each, 2 supervisors $22,000 each, and the owner $85,000. What is the median?

A). $29,500

B). $19,000

C). $16,000

D). $14,500

E). $12,000

Owner $85,000
Supervisor #1 $22,000
Supervisor #2 $22,000
Kitchen Hand #1 $16,000
Kitchen Hand #2 $16,000
Kitchen Hand #3 $16,000

Pop-Quiz 9: Quiz Yourself

Last year, a fast food outlet in a beachside city paid 3 kitchen hands $16,000 per year each, 2 supervisors $22,000 each, and the owner $85,000. What is the mean?

A). $29,500

B). $19,000

C). $16,000

D). $14,500

E). $12,000

Answer 9: B

Last year, a fast food outlet in a beachside city paid 3 kitchen hands $16,000 per year each, 2 supervisors $22,000 each, and the owner $85,000. What is the mean?

A). $29,500

B). $19,000

C). $16,000

D). $14,500

E). $12,000