8210 WK 8 DISCUSSION

profileCandyy31
SPSSINFO8210WK8.docx

6

Before You Begin

Before reading this Skill Builder, be sure to review the following concepts:

· Steps of hypothesis testing

· Null hypothesis 

· Alternative hypotheses

· p-value

· alpha

· Under what circumstances to reject the null hypothesis

· Effect size

· Practice significance/meaningfulness

Correlation Coefficients

Researchers who study adolescent peer relationships are often interested in whether peers shape one another's attitudes toward many different things, including drug and alcohol use, delinquent behavior, and academics. Suppose you are a researcher interested in whether adolescents and their peers shape each other's attitudes toward academics. Although it can be tricky to obtain data that would allow you to examine causal relationships between adolescents’ and peers’ academic attitudes, you can, as a start toward studying this topic, examine whether there are associations between adolescents’ and peers’ academic values. For example, do adolescents who value academics tend to have friends who also value academics?

Pearson’s Correlation

Pearson’s correlation is one method of examining associations among variables. It allows researchers to examine how one variable changes as another variable change. For example, will one variable increase as the other variable increases? Or, will one variable decrease as the other variable increases? 

The table below shows an example of SPSS output from Pearson’s correlation showing the association between adolescents’ value for the subject of English and their peers’ value for English (Loken, 2005). Students were asked to answer questions that would indicate how much value they place on English (e.g., how much they like English) and their peers were also asked the same set of questions (Eccles, et. al., 1983). Scores on the English value variable range from a low of 1 to a high of 7, with higher scores indicating a greater degree of value for English.

References: Loken, E.. Academic achievement in middle schoolers. 2005. Eccles et. al. Achievement and achievement motivation in expectancies, values, and academic behaviors. W.H. Freeman. 1983

Table: SPSS Output from Pearson's Correlation

Empty

English Value

Peer Group's English Value

English value

Pearson Correlation

1

.438**

Sig. (2-tailed)

.000

N

67

63

Peer group's English Value

Pearson Correlation

.438**

1

Sig. (2-tailed)

.000

blank

N

63

63

**. Correlation is significant at the 0.01 level (2-tailed).

Pearson’s correlation is typically used when research scenarios meet these criteria: 

· bullet

The researcher wants to examine the association between two variables

· bullet

Both variables can be considered to be continuous

Although the relationship between the two variables is not always linear, we will only focus on linear associations in this Skill Builder. While we will not focus on scatter plots here, examining a scatter plot is an effective way to see if the association is linear or curvilinear and to gauge the strength and direction of the association between the variables.

The Strength and Direction of Correlation Coefficients

When researchers examine the correlation coefficients in their SPSS output, how do they interpret them to discern the strength and direction of the association between the two variables? First, when we examine our SPSS output, we should think about the null hypothesis that we are testing, and we will need to decide whether or not to reject the null hypothesis. The null and alternative hypotheses for a Pearson’s correlation test can be written as:

Null: p = 0

Alternative: p ≠ 0

The null hypothesis states that there is no association between the variables; that is, the Pearson’s correlation coefficient is equal to zero. The alternative hypothesis, on the other hand, specifies that there is an association between the variables – that the Pearson’s correlation coefficient is not equal to zero. In our example of students’ and peers’ English value, SPSS is indicating a p-value of .000 (see the “sig. (2-tailed)” value in the table below). If we assume that alpha was set at .05, we would reject the null hypothesis and conclude that the results are consistent with there being an association between students’ and peers’ value for English.

Correlations

SPSS Output: Students' and Peers' English Value

Simply looking at the p-value in order to interpret correlation results will not be sufficient, however. Now that we have concluded that there is evidence of an association between the variables, we need to figure out the direction and the strength of the association between the two variables. This helps researchers understand the nature of the relationship between the variables and get a sense of the effect size for the correlation results, which will help with understanding the practical significance, or meaningfulness, of the results.   

The correlation value will indicate two possible directions for the association, based on whether the value in our output is positive (e.g., .438) or negative (e.g., -.438):

· bullet

Positive: As one variable increases, the other variable increases

· bullet

Negative: As one variable increases, the other variable decreases

Based on the correlation value, researchers can also discern whether the association between the variables is weak, moderate, or strong. The correlation value will range from -1 to 1. In order to assess the strength of the association, we want to pay attention to the absolute value of the correlation coefficient. In other words, a .4 correlation coefficient and a -.4 correlation coefficient will indicate the same strength of association.

General Rule for Interpreting the Correlation Coefficient

Although different sources give slightly different information about assessing the strength of a correlation coefficient, we can use this as a general rule:

· 1

1

.8 to 1: very strong

· 2

2

.6 to .8: strong

· 3

3

.4 to .6: moderate

· 4

4

.2 to .4: weak

· 5

5

0 to .2: very weak to no relationship

Again, note that we are interested in the absolute value of the correlation coefficient. So, for example, if our correlation coefficient were to be -.8, we would conclude that there is a very strong relationship between the variables.

In the Students' and Peers' English Value SPSS output above, our Pearson’s correlation value is .438. If we were reporting this result in APA style, we would say that there is a moderate positive relationship between students’ and peers’ English value, r  (61) = .44, p <.01. The correlation coefficient does not have a negative sign in front of it so it would be considered to be positive and, according to the general rule above the association falls in the .4 to .6 range, so it would be considered to be of moderate strength. The positive relationship indicates that as students’ value for English increases, so does their peers’ value for English. In other words, students and their peers show some degree of similarity in the value they place on English, which is probably not surprising.  

Regression Models

In addition to Pearson’s correlation analysis, linear regression is another way to examine the association between two variables. Regression models are quite flexible and, unlike with a single Pearson’s correlation analysis, the researcher can include multiple predictor variables that they wish to associate with the dependent variable in the same regression model. Regression models can also incorporate both categorical and continuous predictors. For the sake of this Skill Builder, however, we will focus on models that have just one (1) single predictor variable (bivariate regression models), and we will focus on cases in which both variables (the predictor and the outcome) are continuous.

Let’s return to our example of the association between students’ and peers’ value for English. We can run a regression model that is comparable to the Pearson’s correlation analysis that we did previously. Take a look below at the regression results from SPSS.

Regression Results

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.438a

.191

.178

1.35361

a. Predictors: (Constant), peer group's English value.

ANOVAa

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

26.466

1

26.466

14.445

.000b

Residual

111.768

61

1.832

blank

blank

Total

138.234

62

blank

blank

blank

a. Dependent Variable: English value. b. Predictors: (Constant), peer group's English value.

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

95% Confidence Interval for B

B

Std. Error

Beta

Lower Bound

Upper Bound

1

(Constant)

1.667

.769

blank

2.168

.034

.129

3.204

peer group's English value

.654

.172

.438

3.801

.000

.310

.998

Legend for Coefficientsa

Unstandardized regression coefficient

Standardized regression coefficient

p-value for the association between students' and peers' English value

The Standardized Regression Coefficient

+

Unstandardized Regression Coefficient

+

The Strength of the Effect

Term

Meaning

+∞

Positive infinity.

-.564

Observed value of the test statistic.

-∞

Negative infinity.

.004

p-value

.576

p-value

2-tailed

The alternative hypothesis states simply that there is a difference between the means but does not specify the direction of the difference.

61

61 is the degrees of freedom (df) calculated by n-2 (63-2)

alpha

The probability of a type I error.

box-plot

A graph that displays key elements of distribution.

categorical variables

Variables that have a limited number of possible values; participants in the study get placed into one of a small number of categories for the variable.

central limit theorem

regardless of the distribution of the population, if the sample size is relatively large (a rule of thumb is n > 30), the sampling distribution of sample means is close to normal.

cohen’s d

A measure of effect size.

confidence intervals

A range of values used to specify the likelihood that the population parameter is contained within a specified range.

continuous variable

A continuous variable is one based on an interval or ratio level of measurement. Between any two values for the variable, there is another possible value.

continuous variables

A continuous variable is one based on an interval or ratio level of measurement. Between any two values for the variable, there is another possible value.

control group

The collection of participants in the condition of an experiment who do not receive the treatment. A group receiving an actual treatment can then be compared to the control group.

dependent variable

A measure of the outcome that allows us to determine whether the independent variable has an effect.

discrete

A variable based on an ordinal, interval, or ratio levels of measurement and has a countable, not infinite, set of possible values.

distribution of a population

The distribution of all values for all elements of the population.

distribution of a sample

The distribution of actual observations based on the data that you collect.

distribution of the sample

Sample distribution (also called distribution of the sample) –for a variable, the distribution of values for the elements of the population that are actually observed. (note that Sample distribution is different from Sampling distribution).

element

an entity in the population that may be selected for the sample and then observed.

factor

The alternative hypothesis stated simply that there was a difference between the means, and does specify the direction of the difference.

frequency distribution

A table or graph that shows the values of a variable and the number (count) of observations associated with each value

general rule

Although different sources give slightly different information about assessing the strength of a correlation coefficient, we can use the following as a general rule for interpreting the correlation coefficient:.8 to 1: very strong.6 to .8: strong.4 to .6: moderate.2 to .4: weak0 to .2: very weak to no relationship

independent variable

The variable that is studied to see if it causes a change in a dependent variable.

interval

The level of measurement that addresses differences, or intervals, between entities.

interval estimates

A range of values that is likely to contain the population parameter.

levels of confidence

The probability that the population parameter is contained within a specified range of values. Usually, the level of confidence is 0.95 or 95%.

levels of measurement

Also called scale of measurement, describes the amount and type of information (nominal, ordinal, interval, and ratio) that is conveyed by the numbers or words assigned to real-world objects during the measurement process.

levene’s test

Tests the null hypothesis that the two populations show equal variance.

margin of error

The amount of estimated error in the point estimate of a population parameter determined by the level of confidence and the sampling distribution for the sample statistic. In estimating the population means, the margin of error equals a critical value for statistic times the standard error of the mean, e.g., Zα2*σn.

mean

The average of the scores for a variable.

median

An appropriate measure of central tendency when a measurement is at the ordinal, interval, or ratio level.

mode

The most frequently occurring value in the data set.

n

n = sample size

n1

n1 = the number of participants in sample 1

n2

n2 = the number of participants in sample 2

negative skew

This refers to the tail of the distribution appearing longer on the left-hand side of the distribution.

nominal

The lowest level of measurement, which addresses naming—identifying or categorizing objects using a name.

one-tailed

The alternative hypothesis is directional and states that one mean is greater than the other.

ordinal

The level of measurement above nominal that addresses ordering real-world entities.

outliers

Observation points that are distant from other observations.

p <.01

This indicates that the p-value (.000) is less than .01 and that the correlation test is statistically significant.

p-value

The probability of obtaining a result equal to or "more extreme" than what was actually observed, when the null hypothesis is true.

pictogram

A graphic character used in picture writing.

point estimate

An estimate of the unknown parameter of interest using a single value.

population

The set of all possible elements (entities and observations) to which the researcher wishes to generalize.

population distribution

for a variable, the distribution of all values for all elements of the population.

positive skew

This refers to the tail of the distribution appearing longer on the right side of the distribution.

qualitative

A variable based on nominal measurement.

quantitative

A variable with an ordinal, interval or ratio level of measurement.

r

    r is the symbol indicating a Pearson’s correlation coefficient

r-squared

The proportion of variability in the dependent variable that is accounted for by your model.

random assignment

Random assignment is placing experimental units in treatment conditions or control conditions by use of a random process.

random sampling

The selection of experimental units so that each element in the population has the same chance of being selected for the sample.

random variable

A variable whose value is determined by a random process such as being selected in a survey or being observed in an experiment.

ratio

The level of measurement that addresses proportion, or ratios between entities.    

ratio level

The level of measurement that addresses proportion, or ratios, between entities.

relative frequency distribution

A table or graph that shows the values of a variable and the proportion of observations associated with each value using decimal fractions or percentages.

research design

The overall plan for how a researcher will collect data.

sample

A subset of all possible observations.

sampling distribution

The distribution of a sample statistic.

sampling distribution of the sample mean

The distribution of values for the sample mean for all possible random samples of size n.

sampling error

The absolute value of a statistic minus the parameter being estimated.

simple random sampling

Each unit in the population has an equal chance of being selected into the sample.

statistical analyses

The use of probabilistic models to analyze data.

statistical inferences

the process of using sample information to make statements about population parameters.

statistical power

The probability of rejecting a null hypothesis if the null is false (i.e., the alternative is true).

statistically significant

Statistical significance means a null hypothesis has been rejected.

t-test for two independent groups

A statistical test used to examine whether two independent groups have different means on a dependent variable. This test is also sometimes referred to as an independent samples t-test.

two-tailed

The alternative hypothesis states simply that there is a difference between the means but does not specify the direction of the difference.

type i error

Rejecting the null hypothesis if the null is actually true.

type ii error

Incorrectly retaining a false null hypothesis (a "false negative").

unit of analysis

The real-world entity that is observed and for which data are recorded and used in statistical analysis.

value

A single observation defined for a variable.

variable

The mathematical representation of the real-world entity being measured.

variance

Variance is a measure of variability in a set of observations based on the approximate average of squared deviations from the mean.    

visual displays of data

Help researchers communicate the distribution and other key information (the story they are telling with their data) both effectively and efficiently.

µ1

mean for population 1

µ2

mean for population 2

β

β is the symbol researchers use when they report a standardized regression coefficient.

μ not primed

This indicates the population means for the “not primed” condition.

μ not primed - μ primed >0

The alternative hypothesis specifies that the “not primed” condition will score higher than the “primed” condition.

μ primed

This indicates the population means for the “primed” condition.