BIOSTAT 8

BIOSTAT8.docx

2. The Correlational analysis learning activity assignment is worth 50 points and will be graded using the designated rubric. Grading criteria include quality of content, appropriate citations, use of Standard English grammar, and overall organization and readability.

3. Create your assignment using a Microsoft Word application. The document should be saved in a .doc or .docx format.

4. There is no required length but should be specific enough to address all requirements.

5. The following sections should be included in the document (please copy and paste SPSS output to Word document):

a. Question #1

b. Question #2

c. Question #3

d. Attached output files in .spv format

Use the provided data set sample and use SPSS to analyze the data. Use the SPSS Screenshot Guide as a reference when running the tests.

Correlation

1. Using the data file survey.sav follow the instructions in Chapter 11 to explore the relationship between the total mastery scale and life satisfaction. Do the same for total negative affect and total perceived stress. Present the results in a brief report.

2. Use the instructions in Chapter 11 to generate a full correlation matrix to check the intercorrelations among the following variables. Please copy and paste the SPSS table to the Word document.

(a) age (b) perceived stress (tpstress) (c) positive affect (tposaff) (d) negative affect (tnegaff) (e) life satisfaction (tlifesat)

3. Gill, a researcher, is interested in exploring the impact of age on the experience of positive affect (tposaff), negative affect (tnegaff) and perceived stress (tpstress). Follow the instructions in Chapter 11 of the SPSS Survival Manual to generate a condensed correlation matrix which presents the correlations between age with positive affect, negative affect and perceived stress. Please copy and paste the SPSS table to the Word document.

Correlation

Correlation analysis is used to describe the strength and direction of the linear relationship between two variables. There are several different statistics available from IBM SPSS Statistics, depending on the level of measurement and the nature of your data. In this chapter, the procedure for obtaining and interpreting a Pearson product-moment correlation coefficient (r) is presented, along with Spearman Rank Order Correlation (rho). Pearson r is designed for interval level (continuous) variables. It can also be used if you have one continuous variable (e.g. scores on a measure of self-esteem) and one dichotomous variable (e.g. sex: male/female). Spearman rho is designed for use with ordinal level, or ranked, data and is particularly useful when your data do not meet the criteria for Pearson correlation.

IBM SPSS Statistics can calculate two types of correlation for you. First, it can give you a simple bivariate correlation (which just means between two variables), also known as ‘zero-order correlation’. It will also allow you to explore the relationship between two variables while controlling for another variable. This is known as ‘partial correlation’. The procedure to obtain a bivariate Pearson r and non-parametric Spearman rho is presented here in Chapter 11 . Partial correlation is covered in Chapter 12 .

Pearson correlation coefficients (r) can only take on values from –1 to +1. The sign out the front indicates whether there is a positive correlation (as one variable increases, so too does the other) or a negative correlation (as one variable increases, the other decreases). The size of the absolute value (ignoring the sign) provides an indication of the strength of the relationship. A perfect correlation of 1 or –1 indicates that the value of one variable can be determined exactly by knowing the value on the other variable. A scatterplot of this relationship would show a straight line. On the other hand, a correlation of 0 indicates no relationship between the two variables. Knowing the value on one of the variables provides no assistance in predicting the value on the second variable. A scatterplot would show a circle of points, with no pattern evident.

There are several issues associated with the use of correlation that you need to consider. These include the effect of non-linear relationships, outliers, restriction of range, correlation versus causality and statistical versus practical significance. These topics are discussed in the introduction to Part Four of this book. I would strongly recommend that you read through that material before proceeding with the remainder of this chapter.

DETAILS OF EXAMPLE

To demonstrate the use of correlation, I explore the interrelationships among some of the variables included in the survey.sav data file provided on the website accompanying this book. The survey was designed to explore the factors that affect respondents’ psychological adjustment and wellbeing (see the Appendix for a full description of the study). In this example, I am interested in assessing the correlation between respondents’ feelings of control and their level of perceived stress. If you wish to follow along with this example, you should start IBM SPSS Statistics and open the survey.sav file.

Example of research question: Is there a relationship between the amount of control people have over their internal states and their levels of perceived stress? Do people with high levels of perceived control experience lower levels of perceived stress?

What you need: Two variables: both continuous, or one continuous and the other dichotomous (two values).

What it does: Correlation describes the relationship between two continuous variables, in terms of both the strength of the relationship and the direction.

Assumptions: See the introduction to Part Four.

Non-parametric alternative: Spearman Rank Order Correlation (rho).

PRELIMINARY ANALYSES FOR CORRELATION

Before performing a correlation analysis, it is a good idea to generate a scatterplot. This enables you to check for outliers and for violation of the assumptions of linearity (see introduction to Part Four). Inspection of the scatterplots also gives you a better idea of the nature of the relationship between your variables.

To generate a scatterplot between your independent variable (Total PCOISS) and dependent variable (Total perceived stress) follow the instructions detailed in Chapter 7 . The output generated from this procedure is shown below.

images

Interpretation of output from Scatterplot

The scatterplot can be used to check several aspects of the distribution of these two variables.

Step 1: Check for outliers

Check your scatterplot for outliers—that is, data points that are out on their own, either very high or very low, or away from the main cluster of points. Extreme outliers are worth checking: Was the information entered correctly? Could these values be errors? Outliers can seriously influence some analyses, so this is worth investigating. Some statistical texts recommend removing extreme outliers from the data set. Others suggest recoding them down to a value that is not so extreme (see Chapter 6 ).

If you identify an outlier and want to find out the ID number of the case, you can use the Data Label Mode icon in the Chart Editor. Double-click on the chart to activate the Chart Editor window. Click on the icon that looks a bit like a bullseye (or choose Data Label Mode from the Elements menu) and move your cursor to the point on the graph you wish to identify. Click on it once and a number will appear—this is the ID number if you selected ID in Step 4 of the Scatterplot instructions in Chapter 7 ; otherwise, the case number assigned by IBM SPSS Statistics will be displayed. To turn the numbering off, just click on the icon again.

Step 2: Inspect the distribution of data points

The distribution of data points can tell you various things about your data:

images Are the data points spread all over the place? This suggests a very low correlation.

images Are all the points neatly arranged in a narrow cigar shape? This suggests quite a strong correlation.

images Could you draw a straight line through the main cluster of points, or would a curved line better represent the points? If a curved line is evident (suggesting a curvilinear relationship) Pearson correlation should not be used, as it assumes a linear relationship.

Step 3: Determine the direction of the relationship between the variables

The scatterplot can tell you whether the relationship between your two variables is positive or negative. If a line were drawn through the points, what direction would it point—from left to right, upward or downward? An upward trend indicates a positive relationship; high scores on the X axis are associated with high scores on the Y axis. A downward line suggests a negative correlation; low scores on the X axis are associated with high scores on the Y axis. In this example, we appear to have a negative correlation of moderate strength.

Once you have explored the distribution of scores on the scatterplot and established that the relationship between the variables is roughly linear and that the scores are evenly spread in a cigar shape, you can proceed with calculating Pearson or Spearman correlation coefficients.

Before you start the following procedure, choose Edit from the menu, select Options, and on the General tab make sure there is a tick in the box No scientific notation for small numbers in tables in the Output section.

Procedure for requesting Pearson r or Spearman rho

1. From the menu at the top of the screen, click on Analyze, then select Correlate, then Bivariate.

2. Select your two variables and move them into the box marked Variables (e.g. Total perceived stress: tpstress, Total PCOISS: tpcoiss). If you wish you can list a whole range of variables here, not just two. In the resulting matrix, the correlation between all possible pairs of variables will be listed. This can be quite large if you list more than just a few variables.

3. In the Correlation Coefficients section, the Pearson box is the default option. If you wish to request the Spearman rho (the non-parametric alternative), tick the Spearman box instead (or as well).

4. Click on the Options button. For Missing Values, click on the Exclude cases pairwise box. Under Options, you can also obtain means and standard deviations if you wish.

5. Click on Continue and then on OK (or on Paste to save to Syntax Editor).

The syntax generated from this procedure is:

CORRELATIONS

/VARIABLES=tpstress tpcoiss

/PRINT=TWOTAIL NOSIG

/MISSING=PAIRWISE.

NONPAR CORR

/VARIABLES=tpstress tpcoiss

/PRINT=SPEARMAN TWOTAIL NOSIG

/MISSING=PAIRWISE.

The output generated from this procedure (showing both Pearson and Spearman results) is presented below.

Correlations

		tpstress Total perceived stress	tpcoiss Total PCOISS
tpstress Total perceived stress	Pearson Correlation	1	-.581**
	Sig. (2-tailed)		.000
	N	433	426
tpcoiss Total PCOISS	Pearson Correlation	-.581**	1
	Sig. (2-tailed)	.000
	N	426	430

**. Correlation is significant at the 0.01 level (2-tailed).

Nonparametric Correlations

			tpstress Total perceived stress	tpcoiss Total PCOISS
Spearman's rho	tpstress Total perceived stress	Correlation Coefficient	1.000	-.556**
		Sig. (2-tailed)		.000
		N	433	426
	tpcoiss Total PCOISS	Correlation Coefficient	-.556**	1.000
		Sig. (2-tailed)	.000
		N	426	430

**. Correlation is significant at the 0.01 level (2-tailed).

INTERPRETATION OF OUTPUT FROM CORRELATION

For both Pearson and Spearman results, IBM SPSS Statistics provides you with a table giving the correlation coefficients between each pair of variables listed, the significance level and the number of cases. The results for Pearson correlation are shown in the section headed Correlation. If you requested Spearman rho, these results are shown in the section labelled Nonparametric Correlations. You interpret the output from the parametric and non-parametric approaches in the same way.

Step 1: Check the information about the sample

The first thing to inspect is the table labelled Correlations is the N (number of cases). Is this correct? If there are a lot of missing data, you need to find out why. Did you forget to tick the Exclude cases pairwise box in the missing data option? Using listwise deletion (the other option) means any case with missing data on any of the variables will be removed from the analysis. This can sometimes severely restrict your N. In the above example we have 426 cases that had scores on both of the scales used in this analysis. If a case was missing information on either of these variables, it would have been excluded from the analysis.

Step 2: Determine the direction of the relationship

The second thing to consider is the direction of the relationship between the variables. Is there a negative sign in front of the correlation coefficient value? This would suggest a negative (inverse) correlation between the two variables (i.e. high scores on one are associated with low scores on the other). The interpretation of this depends on the way the variables are scored. Always check with your questionnaire, and remember that for many scales some items are negatively worded and therefore are reversed before scoring. What do high values really mean? This is one of the major areas of confusion for students, so make sure you get this clear in your mind before you interpret the correlation output.

In the example given here, the Pearson correlation coefficient (r = –.58) and Spearman value (rho = –.56) are negative, indicating a negative correlation between perceived control and stress. The more control people feel they have, the less stress they experience.

Step 3: Determine the strength of the relationship

The third thing to consider in the output is the size of the correlation coefficient. This can range from –1 to +1. This value will indicate the strength of the relationship between your two variables. A correlation of 0 indicates no relationship at all, a correlation of 1 indicates a perfect positive correlation, and a value of –1 indicates a perfect negative correlation.

How do you interpret values between 0 and 1? Different authors suggest different interpretations; however, Cohen (1988, pp. 79–81) suggests the following guidelines:

small r = .10 to .29

medium r = .30 to .49

large r = .50 to 1.00

These guidelines apply whether or not there is a negative sign out the front of your r value. Remember, the negative sign refers only to the direction of the relationship, not the strength. The strength of correlation of r = .5 and r = –.5 is the same. It is only in a different direction.

In the example presented above, there is a large correlation between the two variables (above .5), suggesting quite a strong relationship between perceived control and stress.

Step 4: Calculate the coefficient of determination

To get an idea of how much variance your two variables share, you can also calculate what is referred to as the ‘coefficient of determination’. Sounds impressive, but all you need to do is square your r value (multiply it by itself). To convert this to percentage of variance, just multiply by 100 (shift the decimal place two columns to the right). For example, two variables that correlate r = .2 share only .2 × .2 = .04 = 4% of their variance. There is not much overlap between the two variables. A correlation of r = .5, however, means 25 per cent shared variance (.5 × .5 = .25).

In our example the Pearson correlation is .581, which, when squared, indicates 33.76 per cent shared variance. Perceived control helps to explain nearly 34 per cent of the variance in respondents’ scores on the Perceived Stress Scale. This is quite a respectable amount of variance explained when compared with a lot of the research conducted in the social sciences.

Step 5: Assess the significance level

The next thing to consider is the significance level (listed as Sig. 2 tailed). This is a frequently misinterpreted area, so care should be exercised here. The level of statistical significance does not indicate how strongly the two variables are associated (this is given by r or rho), but instead it indicates how much confidence we should have in the results obtained. The significance of r or rho is strongly influenced by the size of the sample. In a small sample (e.g. n = 30), you may have moderate correlations that do not reach statistical significance at the traditional p < .05 level. In large samples (N = 100+), however, very small correlations (e.g. r = .2) may reach statistical significance. While you need to report statistical significance, you should focus on the strength of the relationship and the amount of shared variance (see Step 4).

When publishing in some literature areas (particularly health and medical) you may be asked to provide confidence intervals for your correlation coefficients—that is, the range of values in which we are 95 per cent confident the true value lies (if we actually could measure it!). Unfortunately, IBM SPSS Statistics does not provide these; however, there are some websites that provide online calculators for you. If you need to obtain these values I suggest that you go to the website http://vassarstats.net/rho.html . All you need to provide is the r and n values that are available in your SPSS output. You might like to check out the whole VassarStats website, which provides a range of tools for performing statistical computation ( http://vassarstats.net/ ).

PRESENTING THE RESULTS FROM CORRELATION

The results of the above example using Pearson correlation could be presented in a research report as follows. If you need to report the results for Spearman’s Correlation, just replace the r value with the rho value shown in the output. Typically, the value of r is presented using two decimal places (rather the three decimal places provided in the SPSS output), but check with the conventions used in the journals in your literature area. For correct APA style the statistics are presented in italics (r, n, p).

The relationship between perceived control of internal states (as measured by the PCOISS) and perceived stress (as measured by the Perceived Stress Scale) was investigated using a Pearson product-moment correlation coefficient. Preliminary analyses were performed to ensure no violation of the assumptions of normality and linearity. There was a strong negative correlation between the two variables, r = –.58, n = 426, p < .001, with high levels of perceived control associated with lower levels of perceived stress.

Correlation is often used to explore the relationship among a group of variables, rather than just two as described above. In this case, it would be awkward to report all the individual correlation coefficients in a paragraph; it would be better to present them in a table. One way this could be done is shown below.

Table 1

Pearson Product-Moment Correlations Between Measures of Perceived Control and Wellbeing

Scale	1	2	3	4	5
1. Total PCOISS	–
2. Total perceived stress	–.58 **	–
3. Total negative affect	–.48 **	.67 **	–
4. Total positive affect	.46**	–.44 **	–.29 **	–
5. Total life satisfaction	.37 **	–.49 **	–.32 **	.42 **	–

Note. PCOISS = Perceived Control of Internal States Scale. ** p < .001 (2- tailed).

For other examples of how to present the results of correlation see Chapter 7 in Nicol and Pexman (2010b).

OBTAINING CORRELATION COEFFICIENTS BETWEEN GROUPS OF VARIABLES

In the previous procedures section, I showed you how to obtain correlation coefficients between two continuous variables. If you have a group of variables and you wish to explore the interrelationships among all of them, you can ask IBM SPSS Statistics to do this in one procedure. Just include all the variables in the Variables box. This can, however, result in an enormous correlation matrix that can be difficult to read and interpret.

Sometimes, you want to explore only a subset of all these possible relationships. For example, you might want to assess the relationship between control measures (Mastery, PCOISS) and a set of adjustment and wellbeing measures (positive affect, negative affect, life satisfaction). You don’t want a full correlation matrix, because this would give you correlation coefficients among all the variables, including between each of the various pairs of adjustment measures. There is a way that you can limit the correlation coefficients that are displayed. This involves using Syntax Editor (described in Chapter 3 ) to limit the correlation coefficients that are produced by IBM SPSS Statistics.

Procedure for obtaining correlation coefficients between two groups of variables

1. From the menu at the top of the screen, click on Analyze, then select Correlate, then Bivariate.

2. Move the variables of interest into the Variables box. Select the first group of variables (e.g. Total Positive Affect: tposaff, total negative affect: tnegaff, total life satisfaction: tlifesat), followed by the second group (e.g. Total PCOISS: tpcoiss, Total Mastery: tmast). In the output that is generated, the first group of variables will appear down the side of the table as rows and the second group will appear across the table as columns. Put your longer list first; this stops your table being too wide to appear on one page.

3. Click on Paste. This opens the Syntax Editor window.

4. Put your cursor between the first group of variables (e.g. tposaff, tnegaff, tlifesat) and the other variables (e.g. tpcoiss and tmast). Type in the word WITH (tposaff tnegaff tlifesat with tpcoiss tmast). This will ask IBM SPSS Statistics to calculate correlation coefficients between tmast and tpcoiss and each of the other variables listed.

The final syntax should be:

CORRELATIONS

/VARIABLES=tposaff tnegaff tlifesat with tpcoiss tmast

/PRINT=TWOTAIL NOSIG

/MISSING=PAIRWISE.

5. To run this new syntax, you need to highlight the text from CORRELATIONS down to and including the full stop at the end. It is very important that you include the full stop in the highlighted section.

6. With this text highlighted, click on the green triangle or arrow-shaped icon, or, alternatively, click on Run from the Menu, and then Selection from the drop-down menu that appears.

The output generated from this procedure is shown as follows.

		tpcoiss Total PCOISS	tmast Total Mastery
tposaff Total positive affect	Pearson Correlation	.456**	.432**
	Sig. (2-tailed)	.000	.000
	N	429	436
tnegaff Total negative affect	Pearson Correlation	-.484**	-.464**
	Sig. (2-tailed)	.000	.000
	N	428	435
tlifesat Total life satisfaction	Pearson Correlation	.373**	.444**
	Sig. (2-tailed)	.000	.000
	N	429	436

**. Correlation is significant at the 0.01 level (2-tailed).

Presented in this manner, it is easy to compare the relative strength of the correlations for my two control scales (Total PCOISS, Total Mastery) with each of the adjustment measures.

COMPARING THE CORRELATION COEFFICIENTS FOR TWO GROUPS

Sometimes, when doing correlational research, you may want to compare the strength of the correlation coefficients for two separate groups. For example, you may want to explore the relationship between optimism and negative affect for males and females separately. One way that you can do this is described below.

Procedure for comparing correlation coefficients for two groups of participants

Step 1: Split the sample

1. From the menu at the top of the screen, click on Data, then select Split File.

2. Click on Compare Groups.

3. Move the grouping variable (e.g. sex) into the box labelled Groups based on. Click on OK (or on Paste to save to Syntax Editor). If you use Syntax, remember to run the procedure by highlighting the command and clicking on Run.

4. This will split the sample by sex and repeat any analyses that follow for these two groups separately.

The syntax for this command is:

SORT CASES BY sex.

SPLIT FILE LAYERED BY sex.

Step 2: Run correlation

Follow the steps in the earlier section of this chapter to request the correlation between your two variables of interest (e.g. Total optimism: toptim, Total negative affect: tnegaff). The results will be reported separately for the two groups.

The syntax for this command is:

CORRELATIONS

/VARIABLES=toptim tnegaff

/PRINT=TWOTAIL NOSIG

/MISSING=PAIRWISE .

Important: The Split File operation stays in place until you turn it off. Therefore, when you have finished examining males and females separately you will need to turn the Split File option off. To do this, click on Data, Split File and select the first button: Analyze all cases, do not create groups.

The output generated from the correlation procedure is shown below.

sex sex			toptim Total Optimism	tnegaff Total negative affect
1 MALES	toptim Total Optimism	Pearson Correlation	1	-.220**
		Sig. (2-tailed)		.003
		N	184	184
	tnegaff Total negative affect	Pearson Correlation	-.220**	1
		Sig. (2-tailed)	.003
		N	184	185
2 FEMALES	toptim Total Optimism	Pearson Correlation	1	-.394**
		Sig. (2-tailed)		.000
		N	251	250
	tnegaff Total negative affect	Pearson Correlation	-.394**	1
		Sig. (2-tailed)	.000
		N	250	250

Correlation is significant at the 0.01 level (2-tailed).

Interpretation of output from correlation for two groups

From the output given above, the correlation between Total optimism and Total negative affect for males was r = –.22, while for females it was slightly higher, r = –.39. Although these two values seem different, is this difference big enough to be considered significant? Detailed in the next section is one way that you can test the statistical significance of the difference between these two correlation coefficients. It is important to note that this process is different from testing the statistical significance of the correlation coefficients reported in the output table above. The significance levels reported above (for males: p = .003, for females: p = .000) provide a test of the null hypothesis that the correlation coefficient in the population is 0. The significance test described below, however, assesses the probability that the difference in the correlations observed for the two groups (males and females) would occur as a function of a sampling error, when in fact there was no real difference in the strength of the relationship for males and females.

TESTING THE STATISTICAL SIGNIFICANCE OF THE DIFFERENCE BETWEEN CORRELATION COEFFICIENTS

In this section, I describe a procedure that can be used to find out whether the correlations for the two groups (males/females) are significantly different. IBM SPSS Statistics does not provide this information. The quickest way to check this is to use an online calculator. One of the easiest ones I have found to use is available at http://vassarstats.net/rdiff.html .

Assumptions

As always, there are assumptions to check first. It is assumed that the r values for the two groups were obtained from random samples and that the two groups of cases are independent (not the same participants tested twice). The distribution of scores for the two groups is assumed to be normal (see histograms for the two groups). It is also necessary to have at least 20 cases in each of the groups.

Procedure for using the online calculator

From the IBM SPSS Statistics Correlation output, find the r value and n for Group 1 (males) and Group 2 (females).

Males: ra = .22

na = 184

Females: rb = .394

nb = 250

In the online calculator ( http://vassarstats.net/rdiff.html ) enter this information into the boxes provided for Sample A (males) and Sample B (females) and press the Calculate button.

The result of the procedure will appear in the boxes labelled z, p (one-tailed) and p (two-tailed). In this example the z value is –1.97 and the p (two-tailed) is .0488. Given that the p value is less than .05 the result is statistically significant. We can conclude that there is a statistically significant difference in the strength of the correlation between optimism and negative affect for males and females. Optimism explains significantly more of the variance in negative affect for females than for males.

ADDITIONAL EXERCISES

Health

Data file: sleep.sav. See Appendix for details of the data file.

1. Check the strength of the correlation between scores on the Sleepiness and Associated Sensations Scale (totSAS) and the Epworth Sleepiness Scale (ess).

2. Use Syntax to assess the correlations between the Epworth Sleepiness Scale (ess) and each of the individual items that make up the Sleepiness and Associated Sensations Scale (fatigue, lethargy, tired, sleepy, energy).