WK 7 DIS. EPID
NURS 8211 Research for an evidence-based practice week 7: Correlation
week 7: Correlation
Objectives
Analyze sample data for associations and relationships
Interpret a scatterplot to determine the relationship between two continuous variables
Consider the role of theory in determining causality
PEARSON’S R AND SPEARMAN’S RHO
Correlational techniques look at relationships, or associations, between variables
The relationship between two variables is called a bivariate relationship
Finding out that two variables are related is not the same as being able to state that x caused y
It is also possible to examine the relationship between two variables, while controlling for the effect of a third
This is called partial correlation
3
Pearson’s r and Spearman’s rho
4
Criteria for deciding between the two statistics:
Pearson’s Assumptions
Normality – use the usual tests (normal curve, skewness, etc.)
Linearity - the two variables must have a linear relationship
Homoscedasticity – data points cluster along a diagonal
The assumptions required to run Pearson’s correlation are that:
Both variables being analyzed are normally distributed - You can use the usual tests for normality for the two variables that we have used in previous slide sets.
That there is a linear relationship between them. A linear relationship is one of direct proportionality that, when plotted on a graph, traces a straight line. In linear relationships, any given change in an independent variable will always produce a corresponding change in the dependent variable.
Homoscedasticity: That the data points along the diagonal cluster around the line… This assumption means that the variance around the regression line is the same for all values of the predictor variable (X). The points should be very near the regression line. When there is much more variability with points spread out more around the regression line then do not have homoscedasticity but heteroscedasticity.
Without the noticeable outliers.
To test for linearity, homoscedasticity, and outliers, you can create a scatter plot (see the next slide) for a quick visual inspection, though there are statistical tests you can employ for a more formal analysis.
4
Checking Linearity and Homoscedasticity
5
To check for linearity and homoscedasticity, create a scatter plot in excel.
If you look at the two sets of graphs in the slide, the first set demonstrates linearity, while the second shows the difference between homoscedasticity and heteroscedasticity.
Linearity: In the first set of graphs, you can see in the top left graph that the plot points scatter around the diagonal (i.e., regression line) in a linear fashion. This is what we mean when we say that the variables demonstrate a linear relationship. The other three plots either do not follow a straight line and/or have significant outliers. If you were to plot your two variables against one another and saw anything that looked like any of the other three plots, you would be better off using a Spearman’s rho for your correlational analysis.
The second set of graphs demonstrates the concept of homoscedasticity. Notice that the plot points in the graph on the left are scatter around an imaginary diagonal line, with most of the points falling within the middle of the distribution and tapering a bit at either end (cigar shape). In a homoscedastic curve, points are evenly dispersed both above and below the regression line representing equal variances
This graph demonstrates homoscedasticity so you would be safe to use the Pearson’s r for your correlation coefficient. On the other hand, the data point in the graph on the right are very tapered at the bottom left corner of the graph and spread out toward the top right corner. This second graph demonstrates heteroscedasticity and if you saw this with your data you should use a Spearman’s rho correlation.
5
Correlation hypothesis and sample practice-focused question
Null hypothesis: There is no relationship between minutes spent in aerobic activity per week and perceived stress in emergency department nurses.
Practice-focused question: Is there an association between minutes spent in aerobic activity per week and perceived stress in staff emergency department nurses?
6
Correlation assumptions & characteristics
Assumptions:
Both variables must be measured on a continuous basis.
Normality
Linearity
Characteristics:
Can be negative or positive
Best visualized in a scatterplot
Does NOT imply causality
7
Have some fun with correlation
https://www.tylervigen.com/spurious-correlations
These are actually two line graphs but as you can see, these are pretty closely aligned. Well, take a look at these. Is there as association between how many people watch the daytime soapopera “Days of Our Lives” and the number of dietetic technicians working in NC. That’s crazy, right? There is no basis for making this connection regardless of what this data shows.
This is called a spurious correlation and really has no basis.
I show it b’c it makes a good point: you MUST have theory and support for any correlation that you pose.
The time in minutes spent in aerobic activity and perceived stress is a good example of a theory based practice-focused question. There is plenty of evidence (lots of research, the highest form of evidence) that posits this relationship.
This would be a negative relationship, that is as time in minutes in mindful meditation goes UP, the perceived stress in participants would go DOWN.
Let’s take a look at these two variables and see how this plays out with some sample data.
8
Correlation
| Correlations | |||
| Hours of Aerobic Exercise per Week | Perceived Stress | ||
| Hours of Aerobic Exercise per Week | Pearson Correlation | 1 | -.788** |
| Sig. (1-tailed) | <.001 | ||
| N | 30 | 30 | |
| Perceived Stress | Pearson Correlation | -.788** | 1 |
| Sig. (1-tailed) | <.001 | ||
| N | 30 | 30 | |
| **. Correlation is significant at the 0.01 level (1-tailed). |
Null hypothesis: There is no relationship between minutes spent in aerobic activity per week and perceived stress in emergency department nurses.
Practice-focused question: Is there an association between minutes spent in aerobic activity per week and perceived stress in staff emergency department nurses?
There were 30 observations in this dataset. In addition to exercise and stress, selfesteem, life satisfaction and IQ were also measured.
The parametric statistic that measures correlation is called the Pearson product moment r. It is represented in scientific notation by a lower case r in italics. It measures correlation using a coefficient that can run from – 1 (a perfect, negative correlation to a +1, a perfect positive correlation.
As the number gets closer to the absolute 1, the strength of the correlation increases. A correlation over 6.5 is typically a strong relationship. Between
The p value can be set to a one-sided or a two-sided test of significance.
Spearman’s Rho is another type of correlation, but this one is a nonparametric test. You might use this if there was something in the data that suggested a lack of normality.
Note that the correlations are the same for both comparisons.
Remember that you get the same result either way that you look at it.
Note that the scatterplot, by convention has the dependent or outcome variable on the y axis (the vertical axis). That leaves the hours of aerobic exercise per week on the x axis the vertical axis. So what you can visually is that as the minutes of aerobic activity increases, the stress level decreases. Thus, the “picture worth 1,000 words” show the line of best fit moving diagonally from the upper left of the graph to the lower right. The correlation coefficient of - .788 shows a strong correlation with a statistically significant p value <.001. Now there is more on the graph, and we will move into that linear regression discussion next week.
9
Perfect Correlations
10
For every one-unit increase in x, y decreases by a constant amount
This means all the dots fall on a straight line, and r = -1 (perfect negative correlation)
For every one-unit increase in x, y increases by a constant amount
This means all the dots fall on a straight line, and r = +1 (perfect positive correlation)
These two graphs show a perfect positive correlation (on the left) and perfect negative correlation (on the right). This occurs when a unti change in one variable is associated with a constant change (either positive or negative) in a second variable. You will rarely see this outcome in the real world, except when you are looking at the correlation of a variable with itself – which will always be a perfect, positive correlation.
10
Zero Correlation
There is no trend apparent
11
When there is a zero correlation there is no apparent trend in the data. The points are scattered across the graph, as in the example above.
11
Hazel, (2023)
Practice-focused question: “Will a retrospective program evaluation of a local QI initiative at the site demonstrate that child immunization care adherence improves with the deployment of phone call reminders?”
Correlational analysis showed a statistically significant relationship between the passage of time (2021 to 2022) on completed appointments (r = .710, p =.004).
12
Turgut & Yildez (2023)
Investigation of grief and posttraumatic growth related to patient loss in pediatric intensive care nurses: a cross sectional study.
Recall this study introduced in week 5 when we studied ANOVA and Kruskal-Wallis.
The two instruments used in the study are the Texas grief inventory and post traumatic growth scale.
R =.0144 and the p value is 0.041. This indicates when the grief is elevated, so is posttraumatic growth.
Although I mentioned correlational analysis that can be either parametric (Pearson’s r) or nonparametric (Spearman Rho). However, one key requirement is that the data be measured on a continuous basis. In Moradkhani, all of the data were collapsed into categories, precluding the use of Pearson or Spearman.
However, there is a version of the Chi sq that is called Pearson: no coincidence and you may recall seeing this last week when we reviewed chi sq results. It measures association between two variables, and although there is a p value, it is a comparison of counts across a contingency table.
Moradkhani et al. used this type of test of association, and they report p values for the various categorical variables measured in the study.
13
Key Points
A scatterplot shows a keen visual of a correlation
Pearson r is the parametric correlation statistic
Spearman Rho is the nonparametric equivalent
Correlations range between 0 and – 1 or + 1.
The closer to 1, the strong the correlation; the closer to 0 the weakest (and not likely to be statistically significant)
Correlation does not imply causality.
14