Central Tendency and Variability

wjm3774
CATEGORICALVARIABLESMODULE4.docx

MODULE 4 VISUAL DISPLAYS FOR CATEGORICAL VARIABLES

Page navigation

· previous: Unit 2: Visual Displays for Categorical and Continuous Variables

· next: Visual Displays of Categorical Variables: Graphical Displays

· Go to page 

Current Module | Pages 16 - 19

Visual Displays of Categorical Variables Introduction

Learning Objectives

· Evaluate visual displays of categorical variables.

A researcher asks students how they perceived their body weight. They might respond with overweightunderweight, or just about right, in which case each student is a unit of analysis, the answer options represent categories of responses, each answer option is a value, and all of the students’ responses  comprises a data set. One of the first steps in analyzing a sample of data such as this one is to examine what is referred to as the distribution of values for the data set’s variables.

Visual displays of data  help researchers communicate the distribution and other key information (the story they are telling with their data) both effectively and efficiently, including for their own exploration. Put another way, visual displays of data allow researchers to quickly identify interesting aspects of their data (for example, are the study’s participants predominately satisfied with their body weight?), and to do so more efficiently than merely using words. Researchers take different approaches for visually displaying categorical and continuous variables. This skill builder focuses on visual displays for the former.

Identifying Categorical Variables

Categorical variables are those that have a small number of possible values. Usually, categorical variables involve  nominal  or  ordinal  levels of measurement. For example, political party affiliation is an example of a nominal, categorical variable. This variable places individuals into one of just a few categories (e.g. Democrat, Republican, or Independent). An example of an ordinal, categorical variable is highest grade completed, with categories of less than high school, high school diploma, and more than high school. Again, this variable has just a small number of possible values. You will typically use categorical methods of displaying data, such as a bar chart or a pie chart, when the number of categories is less than 10 or 12. If there are too many categories, the displays become messy and difficult to read. Also keep in mind that pie charts and bar charts are not typically used for non-categorical variables. An example of a non-categorical variable would be students’ percentile ranking on a standardized math test; this variable has a large range of values and students aren’t simply placed into one of a limited number of categories.

Learn by Doing

Hints, displayed below

Which of the following variables would (YES) or would not (NO) be considered a categorical variable amenable to a categorical visual display?

Table of multiple choice questions

Yes

No

Weight perception with values of underweight, about right, and overweight

An individual’s weight measured in whole pounds

Hair color coded as black, brown, blonde, red, or other

Getting back to the study of body image, presume that the researcher actually has a random sample of 1,200 U.S. college students who were asked the question of how they perceive their body weight as part of a larger survey. The following table shows part of the responses collected:

Body Image

Student

Body Image

student 25

overweight

student 26

about right

student 27

underweight

student 28

about right

student 29

about right

Here is some information that would be interesting to get from these data:

· What percentage of the sampled students fall into each category?

· How are students divided across the three body image categories? Are they equally divided? If not, do the percentages follow some other kind of pattern?

There is no way to answer these questions by looking at the raw data, which are in the form of a long list of 1,200 responses, and thus not very manageable. However, both of these questions can be easily answered once the researcher summarizes how often each of the categories occurs and looks at the  frequency distribution  of the different values for the variable Body Image.

Creating a table that presents the different values (categories) for the variable Body Image is the first step to take to summarize the distribution of a categorical variable. For example, the table below shows how many times the value “About right” occurs (count), and, more importantly, how often this value occurs (relative frequency) as a percentage. To convert the counts to percentages, divide the frequency (855) by the total number of observations (1200) to obtain the relative frequency, and multiply by 100 to convert to a percentage.

Body Image Distribution

Category

Count

Percent

About right

855

(855/1200) * 100 = 71.3%

Overweight

235

(235/1200) * 100 = 19.6%

Underweight

110

(110/1200) * 100 = 9.2%

Total

n = 1200

100%

Did I Get This

What are the correct percentages for each of the two remaining values (“Overweight” and “Underweight”) for the Body Image variable displayed in the table below? Drag each percentage to its correct location.

Accessible mode

Screen reader users: use the accessible mode button above and use alt+down arrow to open the combo boxes.

Category

Count

Percent

About right

855

(855/1200) ∗ 100=71.3%

Overweight

235

Underweight

110

Total

n=1200

100%

(235/1200) ∗ 100=19.6%

( 110/1200) ∗ 100=9.2%

Visual Displays of Categorical Variables: Graphical Displays

Learning Objectives

· Evaluate visual displays of categorical variables.

Graphs

Now the researcher is ready to use a graphical display so that others can visualize the numerical summaries that were obtained. There are two simple graphical displays for visualizing the distribution of categorical data: The pie chart and the bar graph. While both the pie chart and the bar graph help researchers and those who use their results visualize the distribution of categorical variables, the pie chart (a circle divided into sections like a pie—shown below) emphasizes how the different categories relate to the whole, and the bar chart (side-by-side bars) emphasizes how the different categories compare with each other.

Categorical pie chart of body image showing percentages for About right, Overweight and Underweight categories.

Complete the following activity to learn more about bar graphs by first summarizing the distribution of values in the Body Image variable and then interpreting the results to obtain the information you wanted.

Learn by Doing

Take a close look at each chart below and then answer the questions that follow. In reviewing the two bar graphs,  note that “count” is often referred to as frequency, and percentage is often called “relative frequency.”

Bar graph of Body Image images showing categories on x-axis and count on the y-axis.

Bar graph for Body Image showing categories on x-axis and percentages on y-axis.

Hint, displayed below

What is the difference between the two bar graphs?

The two bar graphs represent the distributions of two different variables

The two bar graphs represent the distribution of "Body Image" obtained from two different samples

The first bar graph represents the count of respondents that chose each category, while the second bar graph represents the percentage of respondents that chose each category

There is no difference

"Fill in the blank" question: select the correct answer.

The results suggest that the students  equally divided across the three body image categories.

The vast majority of students (71.3%) feel that they are .

The second most common response shows that 19.6% of the students feel that they are , and the body perception that occurred the least often was -Select- (9.2%).

Visual Displays of Categorical Variables: Pictograms

Learning Objectives

· Evaluate visual displays of categorical variables.

Pictograms

A variation on the pie chart and bar graph that is commonly used in the media is the  pictogram  . Here are two examples:

Pictogram of how people flush a public toilet.

Source: USA Today Snapshots and the Impulse Research for Northern Confidential Bathroom survey.

Pictogram of how often salads are eaten per week.

Source: Market Facts for the Association of Dressings and Sauces

Learn by Doing

Hint, displayed below

The following pictograph shows the amount of money spent on advertising in three magazines. According to the pictograph, what is the ratio of money spent advertising in Time compared to the amount spent advertising in Newsweek?

Pictograph showing the amount of money spent on advertising in three magazines.

About twice as much is spent advertising in Time than in Newsweek.

About 60% more is spent advertising in Time than in Newsweek.

About four times as much is spent advertising in Time than in Newsweek.

Beware: Pictograms can be misleading.

Consider the ratio of advertising pictograph above. 

This graphic display is aimed at advertisers deciding where to spend their budgets, and clearly suggests that Time magazine attracts by far the largest amount of advertising spending. Are the differences as dramatic as the pictogram suggests? Look carefully at the numbers above the pens, and you’ll find that advertisers spending in Time is only 1.64 ($4,433,879 / $2,698,386 = 1.64) times more than in Newsweek, and only 2.88 ($4,433,879 / $1,537,617 = 2.88) times more than in U.S. News. Just glancing at the pictogram, however, gives the impression that Time is much further ahead.

Why? Because the areas covered by the pen illustrations are not in proportion to the representative values. In order to magnify the pictures of the pens without distorting them, a designer increased both the height and width of each pen. By increasing both height and width, the area of Time's pen is 1.64 * 1.64 = 2.7 times larger than the Newsweek pen, and 2.88 * 2.88 = 8.3 times larger than the U.S. News pen. Viewers’ eyes capture the area of the pens rather than only the height, and so are misled to think that Time is a bigger winner than it actually is.

The distribution of a categorical variable is summarized using a graphical display, usually a pie chart or bar graph. Numerical summaries supplement the graphs in the form of category counts and percentages. A variation on pie charts and bar graphs is the pictogram, but pictograms can be misleading. Despite (or perhaps because of) their misleading potential, they are more common in the mass media than in refereed journals.

Visual Displays of Categorical Variables: Summary

Before You Continue

Evaluate your ability to perform each of the following tasks. In other words, how well can you do each task?

Table of multiple choice questions

Not at all yet

With a lot of support

With some support

With minimal support

On my own

Evaluate visual displays of categorical variables.*

* Required questions

What concept or topic is the least clear to you at this point?

What other questions do you have?

Submit

QUIZ

16