Week 2 - Assignment 2: Interpret Graphs Using Excel and SPSS

profileFila64
a-conceptual-guide-to-statistics-using-spss_n2.pdf

A Conceptual Guide to Statistics Using SPSS

Descriptive Statistics

By: Elliot T. Berkman & Steven P. Reise

Book Title: A Conceptual Guide to Statistics Using SPSS

Chapter Title: "Descriptive Statistics"

Pub. Date: 2012

Access Date: June 10, 2022

Publishing Company: SAGE Publications, Inc.

City: Thousand Oaks

Print ISBN: 9781412974066

Online ISBN: 9781506335254

DOI: https://dx.doi.org/10.4135/9781506335254.n2

Print pages: 5-18

© 2012 SAGE Publications, Inc. All Rights Reserved.

This PDF has been generated from SAGE Knowledge. Please note that the pagination of the online

version will vary from the pagination of the print book.

www.princexml.com
Prince - Non-commercial License
This document was created with Prince, a great way of getting web content onto paper.

Descriptive Statistics Descriptive statistics

Chapter Outline

• Introduction to Descriptive Statistics • Computing Descriptive Statistics in SPSS • Interpreting the Output • A Closer Look: Eyeballing a Hypothesis Test • A Closer Look: Assessing for Normality

Introduction to Descriptive Statistics

Before we get into the formal “hypothesis testing” type of statistics that you're used to seeing in research articles, we will briefly cover some descriptive statistics. Descriptives are numbers that give you a quick summary of what your data look like, such as where the middle of the distribution is and how much the observations are spread around that middle.

Why should you be interested in these things? First, looking at descriptives is an excellent way to check for errors in data entry (which is especially important if you employ undergraduate research assistants!) and other irregularities in the data such as extreme outliers. Second, because good science involves making small improvements over what has been done before, most researchers tend to use the same measures over and over. Carefully examining your raw data is one of the best ways to familiarize yourself with measures and subject populations that you will use repeatedly. Knowing roughly what the distributions of those things should be and being able to recognize when they look unusual will make you a better researcher and enable you to impress your colleagues with your vast knowledge of bizarre data tics. For example, did you know that many scales of the trait “self-monitoring” have a two-humped (bimodal) distribution? Finally, sometimes your data analysis can start and end with descriptive statistics. As we will see, one measure in particular—the standard error of the mean—plays an important role in hypothesis testing. Often, when the standard error is very small (or, sadly, at times very large), we need not bother even doing hypothesis testing because our conclusions are foregone. In this way, descriptives can give you a fast “sneak peak” not only at your data but also at what conclusions you might draw from them.

Computing Descriptive Statistics in SPSS

We will practice descriptive statistics using the data set “Descriptives.sav” on the course website (see Chapter 1). Once the file is open and you have access to the data, you can get just about any information about them from the Descriptive Statistics menu. Click on Analyze → Descriptive Statistics to pull this up.

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 2 of 15 SAGE Books - Descriptive Statistics

The first decision to make is whether the variable you'd like to examine is categorical or continuous. Categorical variables are made up of distinct groups that differ qualitatively rather than quantitatively, such as sex or experimental treatment condition. Continuous variables are those with a natural underlying metric, such as height, or those that vary along a continuum, such as happiness. To tell the difference, the key question to ask is, “Does it make sense to say that one level of my variable is greater than another?” If so, then your variable can likely be treated as continuous; otherwise, it is categorical. For example, it makes sense to say that a height of 6 ft is greater than a height of 5 ft, but not that Drug A is greater than Drug B.

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 3 of 15 SAGE Books - Descriptive Statistics

If you're dealing with a categorical variable, you'll want to click on Frequencies. In the window that opens, move all of the variables of interest into the right window. By default, this will display a frequency table for each selected variable that tells you how many observations and which percentage of the total are in each category. And that's pretty much everything that you'd want to know about your categorical variables.

The syntax for getting frequencies on these two categorical variables is as follows:

FREQUENCIES VARIABLES=condition sex.

For continuous variables, both the Frequencies and the Descriptives options are useful. In Frequencies, clicking on the Statistics tab brings up a variety of options for summarizing your data. For example, we can get lots of information about the variable age such as the average (mean, median, mode, sum), variability (including the standard error of the mean mentioned previously); and shape of the distribution (skewness and kurtosis). You can also get SPSS to find the cut points to divide your continuous variable into any number of equal groups (use “Quartiles” for four) or groups based on percentiles.

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 4 of 15 SAGE Books - Descriptive Statistics

The last really cool thing that Frequencies can give you is a histogram of the distribution. Along with the skewness and kurtosis, this can give you a sense of whether your variable is approximately normally distributed (which we will see is one of the assumptions that we make about our data that should be checked). In the Frequencies menu, click on Charts, and select “Histograms: With normal curve.” Then click “Continue.”

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 5 of 15 SAGE Books - Descriptive Statistics

One thing to remember when using Frequencies with continuous data is to uncheck the “Display frequency tables” button on the main window. Otherwise, your output screen will be slammed with a large table with one entry per value of your variable. The syntax for this is as follows:

FREQUENCIES VARIABLES=age

/FORMAT = NOTABLE /* Remove the large frequency table /NTILES = 4 / Use “4” for quartiles /STATISTICS = ALL /* Get every statistic in the book /HISTOGRAM NORMAL. /* Histogram with normal curve

Finally, if you are working with continuous data, you might want to transform the data into standard units, which have a mean of 0 and a standard deviation of 1. Values that are in standard units are said to be “standardized” and are called “z-scores.” You can get z-scores of any variable one of two ways: the hard way using the Compute function (Chapter 1) or the easy way using the Descriptives function. Click on Analyze → Descriptives, put your continuous variable in the box on the right, and check the “Save standardized values as variables” box. You can peek into the Options menu if you want, but there's nothing in there that you can't get in from Frequencies. In fact, Frequencies offers a far more complete set of descriptive statistics than Descriptives does!

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 6 of 15 SAGE Books - Descriptive Statistics

The syntax for this is simply

DESCRIPTIVES VARIABLES=age /SAVE.

After you run this syntax, notice that there will be a new variable in your “Variable View” spreadsheet called “Zage” corresponding to the z-score of age.

We can now run all three of these commands at once by highlighting them in the syntax window and clicking the run button.

Interpreting the Output

In this experiment (data available on the website under the name “Reactance.sav”), male and female cigarette smokers viewed a cigarette advertisement that either did or did not contain the Surgeon General's warning (SGW). As you can see, about half (53%) of the participants saw the version with no warning, and 28% of the participants were men.

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 7 of 15 SAGE Books - Descriptive Statistics

As for our continuous variable, notice the hearty statistics table that results from the syntax “/STATISTICS=ALL”:

Below that, we can see a rather unusual-looking histogram:

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 8 of 15 SAGE Books - Descriptive Statistics

So what's going on here? If you look carefully at the histogram, it looks like there is a single 198-year-old person in the sample. That is rather old, especially given the known health risks of smoking. This outlier is confirmed by examining the range in the statistics table: from 16 to 198. On inspecting the original logbooks, we found that (surprise, surprise) our research assistant made a typo when entering this participant's data; the correct age should be 19. Upon fixing this error, the updated histogram looks much better.

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 9 of 15 SAGE Books - Descriptive Statistics

A Closer Look: Eyeballing a Hypothesis Test

Although we will cover formal hypothesis testing in detail for most of the rest of the book, we want to note here that some simple hypothesis tests (e.g., differences between independent means) can be gleaned quite easily from descriptive statistics alone.

The key is that the so-called critical value in hypothesis testing, or the value that your observed test statistic must exceed in order to be called “significant,” depends upon only two things: the Type I error rate, called α (alpha), and the sample size, N. In the social sciences, alpha is typically 0.05. And it turns out that the critical value is more or less the same for sample sizes larger than about 20. This magical critical value is 2, as in any value greater or less than 2 times the standard error from the mean of a variable will be called “significantly different” from that mean.

From the descriptives, we get a report of the mean of a variable (e.g., the mean of age is 30.29) and the standard error around that mean (.981). Here, we can tell that any value smaller than 30.29 − 2*.981 = 28.33, or larger than 30.29 + 2*.981 = 32.25, will be significantly different from our mean of 30.29. So if we had hypothesized that the mean of the sample would be 25, we can tell right away that that hypothesis would be rejected.

If you have a very small sample (e.g., N = 6), or want to be super-conservative, a critical value of 2.5 should do the trick.

A Closer Look: Assessing for Normality

As we mentioned earlier, one of the key assumptions of almost all of the statistics that researchers use is that their dependent measures are normally distributed. The histogram provides a nice visual way to tell if your data are normal. Another way to check for normality is with a “probability-probability” plot, usually denoted as a “P-P” plot. Instead of plotting the frequency at each value of the variable as in the histogram, the P-P plot shows the expected cumulative probability at each value of the variable against the expected cumulative

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 10 of 15 SAGE Books - Descriptive Statistics

probability of a comparison distribution. If your data are perfectly normally distributed, then the P-P plot will be a straight, 45-degree line. If your data deviate from normal, the plot will fluctuate around the line. By convention, a straight line is displayed on the P-P plot to make these fluctuations more obvious. For example, if the distribution of your variable has fat tails and is more flat in the middle than a normal distribution (called platykurtic), then the 10th percentile of your data will correspond to a lower value than the 10th percentile of the normal distribution. Thus, the points on the plot will be below the straight line.

To generate a P-P plot in SPSS, click on Analyze → Descriptive Statistics and choose the P-P Plots function. It will bring up a window where you can select your continuous dependent variable (e.g., age) and plot it against any one of a number of distributions. The default is “Normal.”

The P-P plot itself is a little tricky to read. SPSS plots the observed distribution on the x-axis, and the plot expected under the normal distribution (or whichever distribution you chose) on the y-axis. When the data points (i.e., the open circles) are above the straight line, it means that there are fewer than expected observations by that percentile, and vice versa. So, in the distribution of age, we can tell that the tails are too thin because the points are above the line at the ends. Likewise, the middle is too tall because the points are below the line there.

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 11 of 15 SAGE Books - Descriptive Statistics

The good people at SPSS must agree with us that the observed-by-expected version of the P-P plot is confusing, so they did us the favor of providing a “detrended” version that follows in the output. In this version, the x-axis is the same as above (observed cumulative probability), but the y-axis has been redrawn to represent the deviation from normality at each percentile. Thus, positive values mean too many observations at that point, and negative values mean too few. It is more clear from the detrended plot that our distribution has thin tails and a fat middle (hey, just like my cats!).

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 12 of 15 SAGE Books - Descriptive Statistics

So far, we have covered only visual ways of testing for normality. There are also a handful of more formal ways to check this assumption. The parameters skewness and kurtosis provide quantitative measures of the shape of a distribution. Positive skew values mean the distribution has a long tail to the right; negative skew values mean the distribution has a long tail to the left. Positive values of kurtosis mean the distribution has a high peak (leptokurtic), and negative values of kurtosis mean the distribution has a relatively flat peak above (platykurtic).

SPSS provides the skewness and kurtosis values, along with their standard errors, as part of the output from the Frequencies function.

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 13 of 15 SAGE Books - Descriptive Statistics

A perfectly normal distribution has a skewness of 0 and a kurtosis of 0. So, using what we learned in this chapter about eyeballing a hypothesis test, you can conclude that your distribution is not significantly different from normal if the skewness and kurtosis are within about twice their respective standard errors from zero. In this case, the skewness is probably significantly positively different from normal (because 0.96 is more than 2*0.24 away from 0), but the kurtosis is not (because 0.27 is within 2*0.47 from 0).

Keywords

• descriptive statistics • categorical variable • continuous variable • frequencies • normal distribution • skewness • kurtosis

• kurtosis • descriptive statistics • skewness • histogram • standard errors

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 14 of 15 SAGE Books - Descriptive Statistics

• hypothesis testing • syntax

http://dx.doi.org/10.4135/9781506335254.n2

SAGE © 2012 by SAGE Publications, Inc.

SAGE Books

Page 15 of 15 SAGE Books - Descriptive Statistics

  • A Conceptual Guide to Statistics Using SPSS
    • Descriptive Statistics