Discussion
12
Inferential Analysis
William M. Trochim
James P. Donnelly
Kanika Arora
2e
© 2016 Cengage Learning. All Rights Reserved.
1
The analysis procedure you choose is based on your research design
All of the procedures in this chapter are based on the General Linear Model (GLM)
A system of equations that is used as the mathematical framework for most of the statistical analyses used in applied social research
12.1 Foundations of Analysis for Research Design
© 2016 Cengage Learning. All Rights Reserved.
Statistical analyses used to reach conclusions that extend beyond the immediate data alone
The GLM uses dummy variables
A variable that uses discrete numbers, usually 0 and 1, to represent different groups in your study
12.2 Inferential Statistics
© 2016 Cengage Learning. All Rights Reserved.
3
Uses the GLM to estimate statistical significance
p value: an estimate of the probability of your result if the null hypothesis is true
Statistical significance is not enough; we need an effect size, as well
12.2 Inferential Statistics – Statistical Significance
© 2016 Cengage Learning. All Rights Reserved.
4
12.2 Statistical and Practical Significance
© 2016 Cengage Learning. All Rights Reserved.
Table 12.1 Possible outcomes of a study with regard to statistical and practical significance
5
Foundation for
t-test
ANOVA and ANCOVA
Regression, factor, and cluster analyses
Multidimensional scaling
Discriminant function analysis
Canonical correlation
12.3 General Linear Model
© 2016 Cengage Learning. All Rights Reserved.
6
Assumptions
The relationships between variables are linear
Samples are random and independently drawn from the population
Variables have equal (homogeneous) variances
Variables have normally distributed error
12.3 General Linear Model
© 2016 Cengage Learning. All Rights Reserved.
7
12.3a The Two-Variable Linear Model
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.2 A bivariate plot.
Figure 12.3 A straight-line summary of the data.
Linear model: Any statistical model that uses equations to estimate lines.
8
12.3a The Straight Line Model
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.4 The straight-line model.
Regression line: A line that describes the relationship between two or more variables.
Regression analysis: A general statistical analysis that enables us to model relationships in data and test for treatment effects. In regression analysis, we model relationships that can be depicted in graphic form with lines that are called regression lines.
9
12.3a Estimates Using the Two-Variable Linear Model
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.5 The two-variable linear model.
Figure 12.6 What the model estimates.
Error term: A term in a regression equation that captures the degree to
which the line is in error (that is, the residual) in describing each point.
10
12.3b The “General” in the General Linear Model
© 2016 Cengage Learning. All Rights Reserved.
11
The GLM allows you to summarize a wide variety of research outcomes
The major problem for the researcher who uses the GLM is model specification
How to identify the equation that best summarizes the data for a study
If the model is misspecified, the estimates of the coefficients (the b-values) that you get from the analysis are likely to be biased
12.3b The “General” in the General Linear Model (cont’d.)
© 2016 Cengage Learning. All Rights Reserved.
12
Enable you to use a single regression equation to represent multiple groups
Act like switches that turn various values on and off in an equation
12.3c Dummy Variables
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.7 Use of a dummy variable in a regression equation
13
12.3c Using Dummy Variables
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.8 Using a dummy variable to create separate equations for each dummy variable value.
Figure 12.9 Determine the difference between two groups by subtracting the equations generated through their dummy variables.
14
Assesses whether the means of two groups (for example, the treatment and control groups) are statistically different from each other
12.3d The t-Test
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.10 Idealized distributions for treated and control group posttest values
15
12.3d Three Scenarios
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.11 Three scenarios for differences between means.
16
12.3d Low-, Medium-, and High-Variability Scenarios
© 2016 Cengage Learning. All Rights Reserved.
Table 12.2 shows the low-, medium-, and high-variability scenarios represented with data that correspond to each case.
The first thing to notice about the three situations is that the difference between the means is the same in all three.
17
When you are looking at the differences between scores for two groups, you have to judge the difference between their means relative to the spread or variability of their scores
The t-test does just this—it determines if a difference exists between the means of two groups
12.3d Difference Between the Means
© 2016 Cengage Learning. All Rights Reserved.
12.3d Formula for the t-Test
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.12 Formula for the t-test. (left)
Figure 12.13 Formula for the standard error of the difference between the means. (top right)
Figure 12.14 Final formula for the t-test. (bottom right)
19
t-Value
Standard error of the difference
Variance
Standard deviation (sd)
Alpha level (α)
Degrees of freedom (df)
12.3d The t-Test
The regression formula for the t-test & ANOVA
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.15 The regression formula for the t-test (and also the two-group one-way posttest-only Analysis of Variance or ANOVA model).
t-value: The estimate of the difference between the groups relative to the variability of the scores in the groups.
Standard error of the difference: A statistical estimate of the standard deviation one would obtain from the distribution of an infinite number of estimates of the difference between the means of two groups.
Variance: A statistic that describes the variability in the data for a variable. The variance is the spread of the scores around the mean of a distribution. Specifically, the variance is the sum of the squared deviations from the mean divided by the number of observations minus 1.
Standard deviation: The spread or variability of the scores around their average in a single sample. The standard deviation, often abbreviated SD, is mathematically the square root of the variance. The standard deviation and variance both measure dispersion, but because the standard deviation is measured in the same units as the original measure and the variance is measured in squared units, the standard deviation is usually more directly interpretable and meaningful.
Alpha level: The p value selected as the significance level. Specifically, alpha is the Type I error, or the probability of concluding that there is a treatment effect when, in reality, there is not.
Degrees of freedom (df) A statistical term that is a function of the sample size. In the t-test formula, for instance, the df is the number of persons in both groups minus 2.
20
Meets the following requirements:
Has two groups
Uses a post-only measure
Has a distribution for each group on the response measure, each with an average and variation
Assesses treatment effect as the statistical (non-chance) difference between the groups
12.4a The Two-Group Posttest-Only Randomized Experiment
© 2016 Cengage Learning. All Rights Reserved.
21
Three tests meet these requirements, and they all yield the same results
Independent t-Test
One-way ANOVA
Regression analysis
12.4a The Two-Group Posttest-Only Randomized Experiment (cont’d.)
© 2016 Cengage Learning. All Rights Reserved.
22
Analysis requires results for two main effects and one interaction effect in a 2 x 2 factorial design
12.4b Factorial Design Analysis
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.17 Regression model for a 2 x 2 factorial design.
Main effect: An outcome that shows consistent differences between all levels of a factor.
Interaction effect: An effect that occurs when differences on one factor depend on which level you are on another factor.
23
The dummy variable Z1 represents the treatment group
The other dummy variables indicate the blocks
The beta values (Β) reflect the analogous treatment and blocks
12.4c Randomized Block Analysis
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.18 Regression model for a Randomized Block design
24
An analysis that estimates the difference between the groups on the posttest after adjusting for differences on the pretest
12.4d Analysis of Covariance
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.19 Regression model for the ANCOVA.
25
Quasi-experimental designs still use the GLM, but it has to be adjusted for measurement error
Any influence on an observed score not related to what you are attempting to measure
This adjustment for error makes the analyses more complicated
12.5 Quasi-Experimental Analysis
© 2016 Cengage Learning. All Rights Reserved.
12.5a Nonequivalent Groups Analysis
Formula for adjusting pretest values for unreliability in the reliability-corrected ANCOVA
The regression model for the reliability corrected ANCOVA
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.21 Formula for adjusting pretest values for unreliability in the reliability-corrected ANCOVA
Figure 12.22 The regression model for the reliability corrected ANCOVA
27
12.5b Regression-Discontinuity Analysis
Adjusting the pretest by subtracting the cutoff in the Regression-Discontinuity (RD) analysis model.
The regression model for the basic regression-discontinuity design
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.23 Adjusting the pretest by subtracting the cutoff in the Regression-Discontinuity (RD) analysis model.
Figure 12.24 The regression model for the basic regression-discontinuity design.
28
12.5c Regression Point Displacement Analysis
© 2016 Cengage Learning. All Rights Reserved.
Figure 12.25 The regression model for the RPD design assuming a linear pre-post relationship.
29
Summary
© 2016 Cengage Learning. All Rights Reserved.
Table 12.3. Summary of the statistical models for the experimental and quasi-experimental research designs.
30
What is the difference between statistical significance and practical significance?
Give an example to support your answer
Discuss how the four assumptions underlying the GLM impact the data analysis process
Discuss and Debate
© 2016 Cengage Learning. All Rights Reserved.
Statistical significance simply tells us the probability that there is a difference between groups due to chance alone. Practical significance tells us the degree to which the results have meaning in real life. Examples will vary.
By running the descriptive statistics first, researchers can check the data to be sure it conforms to the four assumptions: 1) the relationships between variables are linear 2) samples are random and independently drawn from the population 3) variables have equal (homogeneous) variances, and 4) variables have normally distributed error. A researcher must test these assumptions, or conclusion validity will be threatened.
31