Central Tendency and Variability
MODULE 4 VISUAL DISPLAYS FOR CATEGORICAL VARIABLES
Page navigation
· previous: Unit 2: Visual Displays for Categorical and Continuous Variables
· next: Visual Displays of Categorical Variables: Graphical Displays
· Go to page
Current Module | Pages 16 - 19
Visual Displays of Categorical Variables Introduction
Learning Objectives
· Evaluate visual displays of categorical variables.
A researcher asks students how they perceived their body weight. They might respond with overweight, underweight, or just about right, in which case each student is a unit of analysis, the answer options represent categories of responses, each answer option is a value, and all of the students’ responses comprises a data set. One of the first steps in analyzing a sample of data such as this one is to examine what is referred to as the distribution of values for the data set’s variables.
Visual displays of data help researchers communicate the distribution and other key information (the story they are telling with their data) both effectively and efficiently, including for their own exploration. Put another way, visual displays of data allow researchers to quickly identify interesting aspects of their data (for example, are the study’s participants predominately satisfied with their body weight?), and to do so more efficiently than merely using words. Researchers take different approaches for visually displaying categorical and continuous variables. This skill builder focuses on visual displays for the former.
Identifying Categorical Variables
Categorical variables are those that have a small number of possible values. Usually, categorical variables involve nominal or ordinal levels of measurement. For example, political party affiliation is an example of a nominal, categorical variable. This variable places individuals into one of just a few categories (e.g. Democrat, Republican, or Independent). An example of an ordinal, categorical variable is highest grade completed, with categories of less than high school, high school diploma, and more than high school. Again, this variable has just a small number of possible values. You will typically use categorical methods of displaying data, such as a bar chart or a pie chart, when the number of categories is less than 10 or 12. If there are too many categories, the displays become messy and difficult to read. Also keep in mind that pie charts and bar charts are not typically used for non-categorical variables. An example of a non-categorical variable would be students’ percentile ranking on a standardized math test; this variable has a large range of values and students aren’t simply placed into one of a limited number of categories.
Learn by Doing
Hints, displayed below
Which of the following variables would (YES) or would not (NO) be considered a categorical variable amenable to a categorical visual display?
|
Table of multiple choice questions |
||
|
|
Yes |
No |
|
Weight perception with values of underweight, about right, and overweight |
|
|
|
An individual’s weight measured in whole pounds |
|
|
|
Hair color coded as black, brown, blonde, red, or other |
|
|
Getting back to the study of body image, presume that the researcher actually has a random sample of 1,200 U.S. college students who were asked the question of how they perceive their body weight as part of a larger survey. The following table shows part of the responses collected:
Body Image
|
Student |
Body Image |
|
student 25 |
overweight |
|
student 26 |
about right |
|
student 27 |
underweight |
|
student 28 |
about right |
|
student 29 |
about right |
Here is some information that would be interesting to get from these data:
· What percentage of the sampled students fall into each category?
· How are students divided across the three body image categories? Are they equally divided? If not, do the percentages follow some other kind of pattern?
There is no way to answer these questions by looking at the raw data, which are in the form of a long list of 1,200 responses, and thus not very manageable. However, both of these questions can be easily answered once the researcher summarizes how often each of the categories occurs and looks at the frequency distribution of the different values for the variable Body Image.
Creating a table that presents the different values (categories) for the variable Body Image is the first step to take to summarize the distribution of a categorical variable. For example, the table below shows how many times the value “About right” occurs (count), and, more importantly, how often this value occurs (relative frequency) as a percentage. To convert the counts to percentages, divide the frequency (855) by the total number of observations (1200) to obtain the relative frequency, and multiply by 100 to convert to a percentage.
Body Image Distribution
|
Category |
Count |
Percent |
|
About right |
855 |
(855/1200) * 100 = 71.3% |
|
Overweight |
235 |
(235/1200) * 100 = 19.6% |
|
Underweight |
110 |
(110/1200) * 100 = 9.2% |
|
Total |
n = 1200 |
100% |
Did I Get This
What are the correct percentages for each of the two remaining values (“Overweight” and “Underweight”) for the Body Image variable displayed in the table below? Drag each percentage to its correct location.
|
Screen reader users: use the accessible mode button above and use alt+down arrow to open the combo boxes. |
||
|
Category |
Count |
Percent |
|
About right |
855 |
(855/1200) ∗ 100=71.3% |
|
Overweight |
235 |
|
|
Underweight |
110 |
|
|
Total |
n=1200 |
100% |
(235/1200) ∗ 100=19.6%
( 110/1200) ∗ 100=9.2%
Visual Displays of Categorical Variables: Graphical Displays
Learning Objectives
· Evaluate visual displays of categorical variables.
Graphs
Now the researcher is ready to use a graphical display so that others can visualize the numerical summaries that were obtained. There are two simple graphical displays for visualizing the distribution of categorical data: The pie chart and the bar graph. While both the pie chart and the bar graph help researchers and those who use their results visualize the distribution of categorical variables, the pie chart (a circle divided into sections like a pie—shown below) emphasizes how the different categories relate to the whole, and the bar chart (side-by-side bars) emphasizes how the different categories compare with each other.
Complete the following activity to learn more about bar graphs by first summarizing the distribution of values in the Body Image variable and then interpreting the results to obtain the information you wanted.
Learn by Doing
Take a close look at each chart below and then answer the questions that follow. In reviewing the two bar graphs, note that “count” is often referred to as frequency, and percentage is often called “relative frequency.”
Hint, displayed below
What is the difference between the two bar graphs?
The two bar graphs represent the distributions of two different variables
The two bar graphs represent the distribution of "Body Image" obtained from two different samples
The first bar graph represents the count of respondents that chose each category, while the second bar graph represents the percentage of respondents that chose each category
There is no difference
"Fill in the blank" question: select the correct answer.
The results suggest that the students equally divided across the three body image categories.
The vast majority of students (71.3%) feel that they are .
The second most common response shows that 19.6% of the students feel that they are , and the body perception that occurred the least often was -Select- (9.2%).
Visual Displays of Categorical Variables: Pictograms
Learning Objectives
· Evaluate visual displays of categorical variables.
Pictograms
A variation on the pie chart and bar graph that is commonly used in the media is the pictogram . Here are two examples:
Source: USA Today Snapshots and the Impulse Research for Northern Confidential Bathroom survey.
Source: Market Facts for the Association of Dressings and Sauces
Learn by Doing
Hint, displayed below
The following pictograph shows the amount of money spent on advertising in three magazines. According to the pictograph, what is the ratio of money spent advertising in Time compared to the amount spent advertising in Newsweek?
About twice as much is spent advertising in Time than in Newsweek.
About 60% more is spent advertising in Time than in Newsweek.
About four times as much is spent advertising in Time than in Newsweek.
Beware: Pictograms can be misleading.
Consider the ratio of advertising pictograph above.
This graphic display is aimed at advertisers deciding where to spend their budgets, and clearly suggests that Time magazine attracts by far the largest amount of advertising spending. Are the differences as dramatic as the pictogram suggests? Look carefully at the numbers above the pens, and you’ll find that advertisers spending in Time is only 1.64 ($4,433,879 / $2,698,386 = 1.64) times more than in Newsweek, and only 2.88 ($4,433,879 / $1,537,617 = 2.88) times more than in U.S. News. Just glancing at the pictogram, however, gives the impression that Time is much further ahead.
Why? Because the areas covered by the pen illustrations are not in proportion to the representative values. In order to magnify the pictures of the pens without distorting them, a designer increased both the height and width of each pen. By increasing both height and width, the area of Time's pen is 1.64 * 1.64 = 2.7 times larger than the Newsweek pen, and 2.88 * 2.88 = 8.3 times larger than the U.S. News pen. Viewers’ eyes capture the area of the pens rather than only the height, and so are misled to think that Time is a bigger winner than it actually is.
The distribution of a categorical variable is summarized using a graphical display, usually a pie chart or bar graph. Numerical summaries supplement the graphs in the form of category counts and percentages. A variation on pie charts and bar graphs is the pictogram, but pictograms can be misleading. Despite (or perhaps because of) their misleading potential, they are more common in the mass media than in refereed journals.
Visual Displays of Categorical Variables: Summary
Before You Continue
Evaluate your ability to perform each of the following tasks. In other words, how well can you do each task?
|
Table of multiple choice questions |
|||||
|
|
Not at all yet |
With a lot of support |
With some support |
With minimal support |
On my own |
|
Evaluate visual displays of categorical variables.* |
|
|
|
|
|
* Required questions
What concept or topic is the least clear to you at this point?
What other questions do you have?
16