Descriptive Statistics

profilewjm3774
SkillBuildersWK3.docx

MODUResources

LE 4 VISUAL DISPLAYS FOR CATEGORICAL VARIABLES

Page navigation

· previous: Unit 2: Visual Displays for Categorical and Continuous Variables

· next: Visual Displays of Categorical Variables: Graphical Displays

· Go to page 

Current Module | Pages 16 - 19

Visual Displays of Categorical Variables Introduction

Learning Objectives

· Evaluate visual displays of categorical variables.

A researcher asks students how they perceived their body weight. They might respond with overweightunderweight, or just about right, in which case each student is a unit of analysis, the answer options represent categories of responses, each answer option is a value, and all of the students’ responses  comprises a data set. One of the first steps in analyzing a sample of data such as this one is to examine what is referred to as the distribution of values for the data set’s variables.

Visual displays of data  help researchers communicate the distribution and other key information (the story they are telling with their data) both effectively and efficiently, including for their own exploration. Put another way, visual displays of data allow researchers to quickly identify interesting aspects of their data (for example, are the study’s participants predominately satisfied with their body weight?), and to do so more efficiently than merely using words. Researchers take different approaches for visually displaying categorical and continuous variables. This skill builder focuses on visual displays for the former.

Identifying Categorical Variables

Categorical variables are those that have a small number of possible values. Usually, categorical variables involve  nominal  or  ordinal  levels of measurement. For example, political party affiliation is an example of a nominal, categorical variable. This variable places individuals into one of just a few categories (e.g. Democrat, Republican, or Independent). An example of an ordinal, categorical variable is highest grade completed, with categories of less than high school, high school diploma, and more than high school. Again, this variable has just a small number of possible values. You will typically use categorical methods of displaying data, such as a bar chart or a pie chart, when the number of categories is less than 10 or 12. If there are too many categories, the displays become messy and difficult to read. Also keep in mind that pie charts and bar charts are not typically used for non-categorical variables. An example of a non-categorical variable would be students’ percentile ranking on a standardized math test; this variable has a large range of values and students aren’t simply placed into one of a limited number of categories.

Learn by Doing

Hints, displayed below

Which of the following variables would (YES) or would not (NO) be considered a categorical variable amenable to a categorical visual display?

Table of multiple choice questions

Yes

No

Weight perception with values of underweight, about right, and overweight

An individual’s weight measured in whole pounds

Hair color coded as black, brown, blonde, red, or other

Getting back to the study of body image, presume that the researcher actually has a random sample of 1,200 U.S. college students who were asked the question of how they perceive their body weight as part of a larger survey. The following table shows part of the responses collected:

Body Image

Student

Body Image

student 25

overweight

student 26

about right

student 27

underweight

student 28

about right

student 29

about right

Here is some information that would be interesting to get from these data:

· What percentage of the sampled students fall into each category?

· How are students divided across the three body image categories? Are they equally divided? If not, do the percentages follow some other kind of pattern?

There is no way to answer these questions by looking at the raw data, which are in the form of a long list of 1,200 responses, and thus not very manageable. However, both of these questions can be easily answered once the researcher summarizes how often each of the categories occurs and looks at the  frequency distribution  of the different values for the variable Body Image.

Creating a table that presents the different values (categories) for the variable Body Image is the first step to take to summarize the distribution of a categorical variable. For example, the table below shows how many times the value “About right” occurs (count), and, more importantly, how often this value occurs (relative frequency) as a percentage. To convert the counts to percentages, divide the frequency (855) by the total number of observations (1200) to obtain the relative frequency, and multiply by 100 to convert to a percentage.

Body Image Distribution

Category

Count

Percent

About right

855

(855/1200) * 100 = 71.3%

Overweight

235

(235/1200) * 100 = 19.6%

Underweight

110

(110/1200) * 100 = 9.2%

Total

n = 1200

100%

Did I Get This

What are the correct percentages for each of the two remaining values (“Overweight” and “Underweight”) for the Body Image variable displayed in the table below? Drag each percentage to its correct location.

Alternate mode

Screen reader users: use the alternate mode button above and use alt+down arrow to open the combo boxes.

Category

Count

Percent

About right

855

(855/1200) ∗ 100=71.3%

Overweight

235

Underweight

110

Total

n=1200

100%

(235/1200) ∗ 100=19.6%

( 110/1200) ∗ 100=9.2%

MODULE 5 VISUAL DISPLAYS FOR CONTINUOUS VARIABLES

Page navigation

· previous: Visual Displays of Categorical Variables: Summary

· next: Identifying Continuous Variables

· Go to page 

Current Module | Pages 20 - 23

Visual Displays for Continuous Variables Introduction

Learning Objectives

· Evaluate visual displays of data for continuous variables.

A researcher conducted a study in which she observed students’ scores on an examination. One of the first steps in analyzing a sample of data is to examine the distribution of values for variables in the data set. The distribution of the data tells her about the frequency with which various values are observed. Distributions can be examined in visual displays such as tables and graphs. A good graph or table is informative and allows researchers to identify and communicate important characteristics of the data. Different approaches are taken for visually displaying categorical and continuous variables.

MODULE 6 MEASURES OF CENTRAL TENDENCY FOR CONTINUOUS VARIABLES

Page navigation

· previous: Unit 3: Measures of Central Tendency and Standard Deviation

· next: Defining the Measures of Central Tendency

· Go to page 

Current Module | Pages 25 - 28

Measures of Central Tendency for Continuous Variables Introduction

Learning Objectives

· Select the appropriate measure of central tendency for a continuous variable.

A study was performed to find out whether pamphlets containing information for cancer patients are written at a level that the cancer patients can understand. Tests were administered to measure the reading ability of 63 cancer patients, and then the readability levels of 30 cancer pamphlets were evaluated based on such factors as the lengths of the sentences and the number of polysyllabic words. Both the reading ability and readability levels correspond to grade levels, but patients' reading levels of less than Grade 3 and above Grade 12 could not be determined exactly.  (Source: Short, Moriarty, and Cooly. (1995). "Readability of Educational Materials for Cancer Patients." Journal of Statistics Education, v.3, n.2) 

The following tables indicate the number of patients at each reading ability level and the number of pamphlets at each readability level.

Patients' Reading Level

<3

3

4

5

6

7

8

9

10

11

12

>12

Count

6

4

4

3

3

2

6

5

4

7

2

17

Pamphlets' Readability Level

6

7

8

9

10

11

12

13

14

15

16

Count

3

3

8

4

1

1

4

2

1

2

1

For this scenario, a researcher might be interested in the typical reading level for patients in the sample or the typical readability level of the pamphlets. In other words, researchers might want to know the central tendency for each of these variables. The central tendency is the value that is the most representative of the entire distribution of scores for a variable. Measures of central tendency for continuous variables are important for researchers and decision-makers because they are often most interested in the typical case.

MODULE 7 STANDARD DEVIATION AS A MEASURE OF VARIABILITY FOR CONTINUOUS VARIABLES

Page navigation

· previous: Measures of Central Tendency for Continuous Variables Summary

· next: Understanding and Reporting Variability Measures

· Go to page 

Current Module | Pages 29 - 31

Measures of Variability

Learning Objectives

· Interpret the standard deviation for a given variable.

How much do students differ from one another in their perceptions of their instructor? Some students will view their instructor as a high performer, and others may view the same instructor as not so great. In other words, there will be variability in how students view the instructor’s performance.

Alternatively, consider a manufacturing process with two machines producing the same product. During a given day, one of the machines produces parts that vary in length by .01 inches, and the other machine produces parts that vary by .001 inches. The production manager will be interested in the variability of the lengths and may have a problem with the first machine if the standard variability is supposed to be less than .01 inches.

Variability in the real world is everywhere, and researchers seek to understand and explain it. Researchers often ask, “How much variance is explained?” A first step in explaining variability is measuring it.

There are a number of measures of variability, but  variance  and standard deviation are the two measures most often used for continuous variables (i.e., those involving interval and ratio levels of measurement). As you’ll see below, these two measures are related to one another. Standard deviation, however, is the measure of variability that researchers typically report because it conveys the amount of variability using the same units as that of the mean (and also the same units as that of the variable itself). This allows readers of the research to easily understand the typical number of units that individuals in the sample vary from the mean.

For a sample of data, the variance is defined as s2s2.

s2=∑i=1n(xi−x)2n−1s2=∑i=1nxi-x2n-1

In this formula, xixi is an observation for the ithith individual, xx is the sample mean (i.e., the simple average), and n is the sample size. The Greek sigma, ∑∑, is called a summation operator, and you will see how it works in a learning activity.

Note that the variance gives you an indication of how spread out the scores are around the mean. The standard deviation is s, which is the square root of the variance:

s=s2−−√s=s2

The standard deviation provides an approximate indication of the typical amount that scores differ from the mean. So a small standard deviation indicates that scores don’t typically differ that much from the mean, and a large standard deviation indicates that scores typically differ substantially from the mean.

Fortunately, you usually do not need to perform the calculations for variance or standard deviation by hand; instead, you can rely on programs like SPSS to do the work for you. It’s important, though, to understand, to a degree, how these measures of variability are calculated. For example, note the subtraction step in calculating the variance. With ratio and interval scales of measurement, differences between items being measured in the real-world can be quantified, and hence, the subtraction process is meaningful for interval and ratio data. Because nominal data do not quantify differences between elements in the real world in a meaningful way, the standard deviation and variance are not used with this scale of measurement. Although the same could be argued for ordinal data, many researchers report the standard deviation for this type of data. If a researcher reports a standard deviation or mean, there has been an implicit willingness to accept the data as being at an interval or ratio level of measurement.

Did I Get This

If a researcher reports the sample variance (s2) for a variable in the study, how would you determine the standard deviation?

Square the variance

Calculate the square root of the variance

MODULE 8 MEASURES OF CENTRAL TENDENCY AND VARIABILITY FOR CATEGORICAL VARIABLES

Page navigation

· previous: Standard Deviation (SD) as a Measure of Variability for Quantitative Variables Summary

· next: Central Tendency

· Go to page 

Current Module | Pages 32 - 35

Measures of Central Tendency and Variability for Categorical Variables Introduction

Learning Objectives

· Explain the measures of central tendency and variability that are typically used for categorical variables.

As a researcher, you will often need to describe the variables you will be using in your analyses using measures of central tendency and variability. For example, you may want to describe variables such as gender or race that provide demographic information about your sample. Demographic data are often categorical, meaning that they only take on a limited number of values. (Recall that categorical variables are typically measured using a nominal or ordinal level of measurement.)

For example, consider a demographic variable from the Afrobarometer data set that is categorical: region of the continent in which the respondent resides. The Afrobarometer data set is based on a survey of African citizens’ attitudes on democracy, governance, the economy, and other related topics (www.afrobarometer.org). The bar graph below presents the data for our variable of interest. Look at the graph and think about how we might describe this variable.

Bar graph with country by region on the x-axis and percent on the y-axis. The countries on the x-axis include West Africa, East Africa, Southern Africa, and North Africa. The percent values on the y-axis range from zero to forty.

What does the graph tell you? Looking at the graph, you can see that West Africa appears to be the most common portion of Africa from which respondents are associated, but each of the four areas represent at least 10% of the total sample. 

The remainder of this skill builder focuses on categorical variables that involve nominal or ordinal level of measurement. Note that ordinal scales with numerical values can sometimes be examined using the median, range, and interquartile range, but these concepts are not covered in this skill builder.

20

25

29

32

16