answer questions
Descriptive Statistics
Unit 5
There are several related topics in this unit…
Descriptive Statistics Overview
Measures of Central Tendency
Measures of Dispersion
Descriptive Statistics Overview Statistics
Descriptive Statistics tell us about specific trends in our data and describe specific
features of our sample.
For example, a researcher will use descriptive statistics to tell readers about the
proportion of men and women who participated in a study. The research may write
something like:
“In this study 40% of the sample were men, whereas 60% were female.”
Or the researcher may inform readers about participants’ average scores on a particular
variable in the study. In this case the researcher may say:
“The mean score on the communication competence measure was 14.55”.
The primary descriptive statistics fall into one of two “families”:
measures of central tendency
measures of dispersion.
Measures of Central Tendency Statistics
Measures of central tendency, as the name infers, tell us
about a central characteristic of the data. Measures of
central tendency include…
the Mode
the Median
and the Mean
Mode Statistics
The mode is the simplest measure of central tendency.
It indicates which score or value in a distribution occurs most frequently.
The mode is appropriate when we have nominal or categorical data.
In these instances we are interested in how often each category was used or
appeared. In essence we count observations that appear in each category and
then report which category had the most observations.
Thus, the mode is the descriptive statistic that tells us which category has the
most observations or which category appears most often in the data.
Mode Statistics
Say, for example, that we are interested in people’s perceptions of what constitutes
sexual harassment? To determine this we could provide people with a list of behaviors
and ask them to respond by simply checking “yes” or “no” (nominal categories) if they
believe the behavior reflects sexual harassment:
1. sexual comments ____ yes ____ no
2. inappropriate gaze ____ yes ____ no
3. sexual jokes ____ yes ____ no
4. display of pornographic materials in the office ____ yes ____ no
5. “pick-up” or “come on” lines ____ yes ____ no
The mode will tell us which of these behaviors people perceived to be more sexually
harassing compared to the others as it would reflect the category that had the most
“yes” responses. Or if we were interested in the least sexually harassing behavior, we
could count up the “no” responses and report the mode for “no” responses.
Median Statistics
The median divides a distribution of quantitative data exactly in
half. It is the score above which and below which half the
observations fall.
The median is most appropriate for describing the center point of
a set of ordinal data because it tells us the point at which half of
the cases rank higher and half rank lower.
For example, in a horse race the horse that finished fifth out of
nine represents the median, as four horses finished before or
above it and four horses finished behind or below it.
Median Statistics
The median also can be used with interval/ratio data, but can be problematic
because it is not sensitive to extreme scores.
That is, two distributions may have the same median or middle point, but one
could include much higher and/or lower values than the other.
Simply seeing the median would lead us to believe that the two distributions of
scores are more similar than they actually are.
For example:
4 10 12 15 28 36 47
10 11 13 15 17 21 25
The median for both distributions is 15, but the first distribution includes much
lower and higher values (from 4 to 47) than the second one (from 10 to 25).
Median Statistics
Beyond simply describing the middle point of a distribution,
researchers may use the median to create groups to compare. This
is called a median split. When researchers have ordinal or ratio
data but want to create groups or categories they can do so by
using a median split to create two groups.
Accordingly, the researcher determines what the median is for the
variable of interest and then creates a “high” group with scores
above the median and a “low group” with scores below the
median.
Median Statistics
For example, a researcher interested in comparing people high and low in
verbal aggressiveness can find the median of the verbal aggressiveness scores
for all participants. She can then take all of the cases above that median to
create the “high” group and all the scores below that median to create the
“low” group.
Then the researcher can compare people high and low in verbal aggressiveness
on some other variable of interest.
Do people high and low in verbal aggressiveness differ with regard to their
marital satisfaction? Their communication competence?
Mean Statistics
The mean is the arithmetic average. It is computed by adding all
the scores in a distribution and dividing by the total number of
scores.
It helps to clarify what the average score on a variable of interest
is. For example, we may see any of the following reported.
“The mean…
“…for communication apprehension was 14.56.”
“…for hours of television watched per week was 8.56.”
“…for age of respondents in this study was 43.69 years.”
Mean Statistics
The mean is appropriate for interval/ratio data because of
“assumed equivalence” or the idea that all points on the scale are
assumed to be of equal distance from one another (i.e.., 1 is the
same distance from two as two is from three and so on).
Unlike the other types of central tendency descriptive statistics
the mean is sensitive to all scores, including extreme scores, in
the distribution.
That is why it is thought to be the most sophisticated measure of
central tendency.
Measures of Dispersion Statistics
Measures of dispersion show us how data spread out in a distribution.
Think about, for example, dropping a glass of water and a can of motor oil on
the floor. Both will spill and disperse (i.e., spread out), but they will do so very
differently.
Thus, measures of dispersion tell us about how data spread out across a
distribution. They include…
the Range
Variance
and Standard Deviation
Range Statistics
The range is the simplest measure of dispersion. It reports the distance
between the highest and lowest scores in the distribution.
The range, therefore, is calculated by subtracting the lowest number from the
highest number in the distribution.
The range gives a general sense of how much the data spread out across the
distribution, which can be helpful for understanding whether a study included
a lot of variability or whether it drew from a narrow spectrum.
For example, if a researcher intends to study a communication variable across
a wide range of age groups, a sample of people aged 18-21 (a range of 3) is not
very diverse. Yet a sample of people aged 18-70 (a range of 52) is.
Range Statistics
One concern with the range is sensitivity to extreme scores. Because the range takes
into account all scores in the distribution it can be misleading when “outlier” scores
exist in a distribution. Outliers are scores that are far removed from the rest of the
distribution.
In the example of age just used you could have a distribution that ranges from 18-70,
yet there is only one person aged 70 and the next closest score is actually 24. The age
of 70 makes the distribution look much larger than it actually is once you take this
outlier into account. If we exclude the outlier, which often researchers do when
necessary, the range is actually 6 as the scores spread from 18-24, not 58 as is the case
when the outlier is included and the scores spread from 18-70.
To avoid problems with outliers researchers may report the interquartile range. This
is the range of scores representing the middle 50% of scores (or the middle two quarters
of the distribution). The upper and lower 25% of scores (the outer quarters where
outliers will be) are excluded. An interquartile range provides a more conservative
representation of the range.
Variance Statistics
Variance is the average distance of scores from the mean, in squared units.
We can compute variance when we have interval or ratio data.
Why squared units? Well, that has to do with how we compute variance. To
compute variance we do the following:
1. Subtract each score in the distribution from the mean score of the distribution
2. Square each of these values
3. Sum all of the squared values
4. Divide the sum of squared values by the total number of scores
Variance Statistics
When computing variance we need to square the values in step two so that
they do not cancel one another out in step 3.
For example, say that we have values that are +2, +3, and +4 points above the
mean and values that are -2, -3, and -4 points below the mean. When we go to
add these up without squaring them they will cancel each other out and we will
end up with a value of zero.
To ensure this doesn’t happen we square all of the values. Thus, 2, 3, and 4
become 4, 9, and 16 regardless of whether or not they were positive or
negative values previously. This is so, as you may recall, because we square
negative numbers to get rid of the negative sign/value.
Thus, all of the values are positive and can be summed for a total. This sum in
turn (known as the sum of squares) can be divided by the total number
of observations.
Variance Statistics
So, in our example above we would add
4 + 9 + 16 + 4 + 9 + 16 = 58
Then we would divide 58 by 6 (the number of observations) to obtain the
variance. In this case the variance is = 9.67.
You can see that this last part of the process essentially involves the
computation of a mean.
Thus, it is helpful to think of variance as the mean or average of how scores
disperse or spread out from the mean score.
Variance is a helpful measure of dispersion, however it is of very limited use
because it is no longer in the original units of measure. Rather, because of the
computation necessary it ends up in squared units.
Standard Deviation Statistics
How then can we change variance into something usable and
meaningful? That is, how do we return to the original units of
measure?
Well, we need to get rid of the squared scores.
You may recall that we use the square root when we want to get
rid of squared scores. The same is true here.
We can take the square root of variance to calculate or compute a
measure of dispersion that is in the original units of measure.
This produces the standard deviation.
Standard Deviation Statistics
Standard deviation, like variance, is a measure of
dispersion that explains how much scores in a set of
interval/ratio data vary from the mean. However, unlike
variance, it is expressed in the original units of
measurement.
So, say the variance is 9.67 as was the case in our
earlier example…
…the square root of 9.67 is 3.11.
…the standard deviation therefore is 3.11.
Standard Deviation Statistics
Standard deviation helps us understand how a distribution spreads out. It is
often reported alongside the mean score of a distribution.
So, for example, we may see reports that list any of the following means and
standard deviations:
M = 12.56, SD = 2.45
M = 10.21, SD = 5.64
M = 28.45, SD = 8.45
An italicized M is the statistical notation for the mean and an italicized SD is
the statistical notation for the standard deviation.
From the reports for each distribution above we would know both the
average score (M) and the average distance of all other scores in the
distribution from that average score (SD).
Standard Deviation Statistics
When we see the M and SD reported, we can draw some conclusions about the
distribution.
What if we saw the following descriptive statistics reported for three different
distributions of data?
M = 14.56, SD = 2.45
M = 14.56, SD = 5.64
M = 14.56, SD = 8.45
The examples above all include the same mean to make a point about the standard
deviation. In the first distribution the scores do not disperse widely, in the
second they disperse moderately, and in the third they disperse considerably.
Thus, the first distribution would appear as a tall and narrow curve, the second
as a bell-shaped curve, and the third as a broad and comparatively flat curve.