Brilliant Answer

hottboy561
SkillBuilderMeasuresofCentralTendencyandVariabilityforCategoricalVariables.docx

Skill Builder 8: Measures of Central Tendency and Variability for Categorical Variables

Introduction

As a researcher, you will often need to describe the variables you will be using in your analyses using measures of central tendency and variability. For example, you may want to describe variables such as gender or race that provide demographic information about your sample. Demographic data are often categorical, meaning that they only take on a limited number of values. (Recall that categorical variables are typically measured using a nominal or ordinal level of measurement.)

For instance, consider a demographic variable from the Afrobarometer data set that is categorical: region of the continent in which the respondent resides. The Afrobarometer data set is based on a survey of African citizens’ attitudes on democracy, governance, the economy, and other related topics (www.afrobarometer.org). The bar graph below presents the data for our variable of interest. Look at the graph and think about how we might describe this variable.

Survey of African citizens’ attitudes on democracy, governance, the economy, and other related topics

What does the graph tell you? Looking at the graph, you can see that West Africa appears to be the most common portion of Africa from which respondents are associated, but each of the four areas represents at least 10% of the total sample.

The remainder of this Skill Builder focuses on categorical variables that involve a nominal or ordinal level of measurement. Note that ordinal scales with numerical values can sometimes be examined using the median, range, and interquartile range, but these concepts are not covered in this Skill Builder.

Central Tendency

Recall, there are three measures of central tendency: the mean, the median, and the mode. For categorical data that involve nominal measurement, the only measure of central tendency that is appropriate to report is the mode, which is the most frequently occurring value for the variable. 

Take a look again at the bar graph above. You can tell by looking at the graph that the mode is West Africa. This value is slightly more frequent than Southern Africa.

Recall that the median requires that the data are measured at the ordinal, interval, or ratio level of measurement, and researchers typically only report the mean for variables that they view as being at the interval or ratio level of measurement. If we think about a categorical variable like gender that has values of male and female, it becomes apparent why we would report the mode and not, for example, the mean. Individuals will be placed into the category of either male or female, and there are no categories in between; therefore, it would not make sense to discuss the average gender for the sample. Rather, researchers would want to convey what the most typical gender is for the sample by reporting which gender – male or female – is most common.

Reporting Frequencies

As noted above, when reporting descriptive data for categorical, nominal variables, one key measure to report is the mode. Researchers will also often report the frequency of each category as this is also important descriptive information. Take a look at the following table  based on the graph above:

Country

Frequency

Percent

West Africa

19,196

37.2

East Africa

8,399

16.3

Southern Africa

18,003

34.9

North Africa

5,989

11.6

Total

51,587

100.0

Note that the table contains the frequency of each category, both as a count (in the “Frequency” column) and as a percent. These percents are calculated by first dividing the count for the value of the categorical variable by the total number of observations in the data set and then multiplying by 100. For example, in order to arrive at the conclusion that 37.2% of the sample is from West Africa, this is the calculation you would do: (19,196/51,587) * 100. To state the frequencies for this variable, a researcher might write this: Participants were from four different regions of Africa: West Africa (37.2%), East Africa (16.3%), Southern Africa (34.9%), and North Africa (11.6%). You can use this as a model when you are reporting your own results.

Variability

Variability is another statistic that researchers report to describe their variables. Variability is often thought of as uncertainty. If you were to randomly select a single element from a population or sample, maximum uncertainty or variability is associated with an equal likelihood of the possible outcomes. When one outcome is almost certain, there is minimal variability.

In other words, with categorical variables, examining variability requires looking at all of the values for the variable and the distribution of observations among the values. When values for the variable are about equal in their likelihood of occurring, there will be greater variability, and when the values are more unequal in their likelihood of occurring, there will be less variability. For example, for a variable with two categories, you would have the maximum amount of possible variability when 50% of participants are in one category and 50% of participants are in the other category.

To think more about the variability of categorical variables, consider the following tables that summarize responses to two questions on the 2014 General Social Survey. This survey measures attitudes, behaviors, and attributes in the United States (http://gss.norc.org/). One of the first questions asked the respondents to report their gender, and another question asked them to report whether or not they were United States citizens.

To think more about the variability of categorical variables, consider the following tables that summarize responses to two questions on the 2014 General Social Survey. This survey measures attitudes, behaviors, and attributes in the United States (http://gss.norc.org/). One of the first questions asked the respondents to report their gender, and another question asked them to report whether or not they were United States citizens.

Gender 

Frequency

Percent

Male

1141

45.0

Female 

1397

55.0

Total

2538

100.0

U.S. Citizen?

Frequency

Percent

Yes

1186

93.4

Valid  No

84

6.6

Total

1270

100.0

Based on the tables above, you can see that gender has a greater variability than the U.S. citizenship variable. Why? Because if you randomly select one person from the sample, you would be relatively uncertain of the gender. Participants are fairly evenly distributed across the two gender categories. On the other hand, there is a very good chance the person is a U.S. citizen, as 93.4% of the sample falls into this category. So there is more uncertainty or variability associated with gender than with citizenship. 

Except for variables that have only two values, there is no common measure of variability for categorical variables. With binary variables, however, you can calculate a standard deviation and variance. Binary variables are those that have only two possible values.

Gender is a good example of a binary variable. The two expected responses for this variable are male and female. The following is the standard deviation for a sample involving a binary variable.

Where s is the standard deviation, P1 is the frequency (count) or percent for one group (e.g., males), and P2 is the frequency (count) or percent for the second group (e.g., females). Note that using counts vs. percents changes the unit of measurement. Counts will result in a standard deviation of less than 1.0, and percents will result in a standard deviation of less than 100%