answer questions

profilexiabuantjngo1
descriptive_statistics.pptx

Descriptive Statistics

Unit 5

There are several related topics in this unit…

Descriptive Statistics Overview

Measures of Central Tendency

Measures of Dispersion

Descriptive Statistics Overview Statistics

Descriptive Statistics tell us about specific trends in our data and describe specific

features of our sample.

For example, a researcher will use descriptive statistics to tell readers about the

proportion of men and women who participated in a study. The research may write

something like:

“In this study 40% of the sample were men, whereas 60% were female.”

Or the researcher may inform readers about participants’ average scores on a particular

variable in the study. In this case the researcher may say:

“The mean score on the communication competence measure was 14.55”.

The primary descriptive statistics fall into one of two “families”:

measures of central tendency

measures of dispersion.

Measures of Central Tendency Statistics

Measures of central tendency, as the name infers, tell us

about a central characteristic of the data. Measures of

central tendency include…

the Mode

the Median

and the Mean

Mode Statistics

The mode is the simplest measure of central tendency.

It indicates which score or value in a distribution occurs most frequently.

The mode is appropriate when we have nominal or categorical data.

In these instances we are interested in how often each category was used or

appeared. In essence we count observations that appear in each category and

then report which category had the most observations.

Thus, the mode is the descriptive statistic that tells us which category has the

most observations or which category appears most often in the data.

Mode Statistics

Say, for example, that we are interested in people’s perceptions of what constitutes

sexual harassment? To determine this we could provide people with a list of behaviors

and ask them to respond by simply checking “yes” or “no” (nominal categories) if they

believe the behavior reflects sexual harassment:

1. sexual comments ____ yes ____ no

2. inappropriate gaze ____ yes ____ no

3. sexual jokes ____ yes ____ no

4. display of pornographic materials in the office ____ yes ____ no

5. “pick-up” or “come on” lines ____ yes ____ no

The mode will tell us which of these behaviors people perceived to be more sexually

harassing compared to the others as it would reflect the category that had the most

“yes” responses. Or if we were interested in the least sexually harassing behavior, we

could count up the “no” responses and report the mode for “no” responses.

Median Statistics

The median divides a distribution of quantitative data exactly in

half. It is the score above which and below which half the

observations fall.

The median is most appropriate for describing the center point of

a set of ordinal data because it tells us the point at which half of

the cases rank higher and half rank lower.

For example, in a horse race the horse that finished fifth out of

nine represents the median, as four horses finished before or

above it and four horses finished behind or below it.

Median Statistics

The median also can be used with interval/ratio data, but can be problematic

because it is not sensitive to extreme scores.

That is, two distributions may have the same median or middle point, but one

could include much higher and/or lower values than the other.

Simply seeing the median would lead us to believe that the two distributions of

scores are more similar than they actually are.

For example:

4 10 12 15 28 36 47

10 11 13 15 17 21 25

The median for both distributions is 15, but the first distribution includes much

lower and higher values (from 4 to 47) than the second one (from 10 to 25).

Median Statistics

Beyond simply describing the middle point of a distribution,

researchers may use the median to create groups to compare. This

is called a median split. When researchers have ordinal or ratio

data but want to create groups or categories they can do so by

using a median split to create two groups.

Accordingly, the researcher determines what the median is for the

variable of interest and then creates a “high” group with scores

above the median and a “low group” with scores below the

median.

Median Statistics

For example, a researcher interested in comparing people high and low in

verbal aggressiveness can find the median of the verbal aggressiveness scores

for all participants. She can then take all of the cases above that median to

create the “high” group and all the scores below that median to create the

“low” group.

Then the researcher can compare people high and low in verbal aggressiveness

on some other variable of interest.

Do people high and low in verbal aggressiveness differ with regard to their

marital satisfaction? Their communication competence?

Mean Statistics

The mean is the arithmetic average. It is computed by adding all

the scores in a distribution and dividing by the total number of

scores.

It helps to clarify what the average score on a variable of interest

is. For example, we may see any of the following reported.

“The mean…

“…for communication apprehension was 14.56.”

“…for hours of television watched per week was 8.56.”

“…for age of respondents in this study was 43.69 years.”

Mean Statistics

The mean is appropriate for interval/ratio data because of

“assumed equivalence” or the idea that all points on the scale are

assumed to be of equal distance from one another (i.e.., 1 is the

same distance from two as two is from three and so on).

Unlike the other types of central tendency descriptive statistics

the mean is sensitive to all scores, including extreme scores, in

the distribution.

That is why it is thought to be the most sophisticated measure of

central tendency.

Measures of Dispersion Statistics

Measures of dispersion show us how data spread out in a distribution.

Think about, for example, dropping a glass of water and a can of motor oil on

the floor. Both will spill and disperse (i.e., spread out), but they will do so very

differently.

Thus, measures of dispersion tell us about how data spread out across a

distribution. They include…

the Range

Variance

and Standard Deviation

Range Statistics

The range is the simplest measure of dispersion. It reports the distance

between the highest and lowest scores in the distribution.

The range, therefore, is calculated by subtracting the lowest number from the

highest number in the distribution.

The range gives a general sense of how much the data spread out across the

distribution, which can be helpful for understanding whether a study included

a lot of variability or whether it drew from a narrow spectrum.

For example, if a researcher intends to study a communication variable across

a wide range of age groups, a sample of people aged 18-21 (a range of 3) is not

very diverse. Yet a sample of people aged 18-70 (a range of 52) is.

Range Statistics

One concern with the range is sensitivity to extreme scores. Because the range takes

into account all scores in the distribution it can be misleading when “outlier” scores

exist in a distribution. Outliers are scores that are far removed from the rest of the

distribution.

In the example of age just used you could have a distribution that ranges from 18-70,

yet there is only one person aged 70 and the next closest score is actually 24. The age

of 70 makes the distribution look much larger than it actually is once you take this

outlier into account. If we exclude the outlier, which often researchers do when

necessary, the range is actually 6 as the scores spread from 18-24, not 58 as is the case

when the outlier is included and the scores spread from 18-70.

To avoid problems with outliers researchers may report the interquartile range. This

is the range of scores representing the middle 50% of scores (or the middle two quarters

of the distribution). The upper and lower 25% of scores (the outer quarters where

outliers will be) are excluded. An interquartile range provides a more conservative

representation of the range.

Variance Statistics

Variance is the average distance of scores from the mean, in squared units.

We can compute variance when we have interval or ratio data.

Why squared units? Well, that has to do with how we compute variance. To

compute variance we do the following:

1. Subtract each score in the distribution from the mean score of the distribution

2. Square each of these values

3. Sum all of the squared values

4. Divide the sum of squared values by the total number of scores

Variance Statistics

When computing variance we need to square the values in step two so that

they do not cancel one another out in step 3.

For example, say that we have values that are +2, +3, and +4 points above the

mean and values that are -2, -3, and -4 points below the mean. When we go to

add these up without squaring them they will cancel each other out and we will

end up with a value of zero.

To ensure this doesn’t happen we square all of the values. Thus, 2, 3, and 4

become 4, 9, and 16 regardless of whether or not they were positive or

negative values previously. This is so, as you may recall, because we square

negative numbers to get rid of the negative sign/value.

Thus, all of the values are positive and can be summed for a total. This sum in

turn (known as the sum of squares) can be divided by the total number

of observations.

Variance Statistics

So, in our example above we would add

4 + 9 + 16 + 4 + 9 + 16 = 58

Then we would divide 58 by 6 (the number of observations) to obtain the

variance. In this case the variance is = 9.67.

You can see that this last part of the process essentially involves the

computation of a mean.

Thus, it is helpful to think of variance as the mean or average of how scores

disperse or spread out from the mean score.

Variance is a helpful measure of dispersion, however it is of very limited use

because it is no longer in the original units of measure. Rather, because of the

computation necessary it ends up in squared units.

Standard Deviation Statistics

How then can we change variance into something usable and

meaningful? That is, how do we return to the original units of

measure?

Well, we need to get rid of the squared scores.

You may recall that we use the square root when we want to get

rid of squared scores. The same is true here.

We can take the square root of variance to calculate or compute a

measure of dispersion that is in the original units of measure.

This produces the standard deviation.

Standard Deviation Statistics

Standard deviation, like variance, is a measure of

dispersion that explains how much scores in a set of

interval/ratio data vary from the mean. However, unlike

variance, it is expressed in the original units of

measurement.

So, say the variance is 9.67 as was the case in our

earlier example…

…the square root of 9.67 is 3.11.

…the standard deviation therefore is 3.11.

Standard Deviation Statistics

Standard deviation helps us understand how a distribution spreads out. It is

often reported alongside the mean score of a distribution.

So, for example, we may see reports that list any of the following means and

standard deviations:

M = 12.56, SD = 2.45

M = 10.21, SD = 5.64

M = 28.45, SD = 8.45

An italicized M is the statistical notation for the mean and an italicized SD is

the statistical notation for the standard deviation.

From the reports for each distribution above we would know both the

average score (M) and the average distance of all other scores in the

distribution from that average score (SD).

Standard Deviation Statistics

When we see the M and SD reported, we can draw some conclusions about the

distribution.

What if we saw the following descriptive statistics reported for three different

distributions of data?

M = 14.56, SD = 2.45

M = 14.56, SD = 5.64

M = 14.56, SD = 8.45

The examples above all include the same mean to make a point about the standard

deviation. In the first distribution the scores do not disperse widely, in the

second they disperse moderately, and in the third they disperse considerably.

Thus, the first distribution would appear as a tall and narrow curve, the second

as a bell-shaped curve, and the third as a broad and comparatively flat curve.