research paper
Chapter 18: Inferential Statistics
Educational Research:
Competencies for Analysis and Application
11/E
Geoffrey Mills and Lorraine Gay
© 2016, 2012, 2009, 2006 Pearson Education, Inc. All Rights Reserved
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
After reading Chapter 18, you should be able to do the following
Explain the concepts underlying inferential statistics.
Select among tests of significance and apply them to your study.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Concepts Underlying Inferential Statistics
Inferential statistics are data analysis techniques for determining how likely it is that results obtained from a sample, or samples, are the same results that would be obtained from the entire population.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Concepts Underlying Inferential Statistics
Descriptive statistics show how often or how frequent an event or score occurred.
Inferential statistics help researchers known whether they can generalize their findings to a population based upon their sample of participants.
Inferential statistics use data to assess likelihood—or probability.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Concepts Underlying Inferential Statistics
Standard Error
Inferences about populations are based on information from samples.
There is very little chance that any sample is identical to the population.
The expected variance among sample means and the population mean is referred to as sampling error.
Sampling error is expected.
Sampling error tends to be normally distributed.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Standard Error
A distribution of sample means is normally distributed and has its own mean and standard deviation.
The standard deviation of the sample means is referred to as the standard error of the mean.
Our ability to estimate standard error of the mean is affected by size of sample.
As the sample size increases the standard error of the mean decreases.
Our ability to estimate the standard error of the mean is also affected by the size of the population standard deviation.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Standard Error
The standard error of the mean can be calculated by:
SEx = the standard error of the mean
SD = the standard deviation for a sample
N = the sample size
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Hypothesis Testing
Hypothesis testing is the process of decision making in which researchers evaluate the results of a study against their original expectations.
Null hypothesis: Predicting no difference in scores
Research hypothesis: Predicting a difference in scores
We want to assure differences we observe between groups are ‘real’ differences and did not occur by chance.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Hypothesis Testing
If the groups are significantly different we reject the null hypothesis.
We do not accept a research hypothesis, we cannot prove our hypothesis.
We instead report that our research hypothesis was supported.
If there are not expected differences, we report that the null hypothesis was not rejected; and that our research hypothesis was not supported.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Tests of Significance
Tests of significance allow us to inferentially test if differences between scores in our sample are simply due to chance or if they are representative of the true state of affairs in the population.
To conduct a test of significance we determine a preselected probability level, known as level of significance (alpha or α).
Usually educational researchers use alpha .05 or 5 out of 100 chances that the observed difference occurred by chance (α =.05).
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Tests of Significance
Two-tailed and one-tailed tests
Tests of significance are almost always two-tailed.
Researchers will select a one-tailed test of significance only when they are quite certain that a difference will occur in only one direction.
It is ‘easier’ to obtain a significant effect when predicting in one direction.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Type I and Type II Errors
Based upon a test of significance the researcher will either reject or not reject the null hypothesis.
The researcher makes a decision that the observed effect is or is not due to chance.
This decision is based upon probability, not certainty.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Type I and Type II Errors
Sometimes the researcher will erroneously reject the null hypothesis or will erroneously retain the null hypothesis.
When the researcher incorrectly rejects the null hypothesis she has committed a Type I error.
When the researcher incorrectly fails to reject the null hypothesis but a true difference exists, she has committed a Type II error.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Type I and Type II Errors
| True status of null hypothesis: True (should not be rejected) | True status of null hypothesis: False (should be rejected) | |
| Researcher’s decision: True (does not reject) | Correct Decision | Type II Error |
| Researcher’s decision: False (rejects) | Type I Error | Correct Decision |
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Degrees of Freedom
After determining whether the significance test will be two-tailed or one-tailed and selecting a probability level (i.e., alpha), the researcher selects an appropriate statistical test to conduct the analysis.
Degrees of freedom are the number of observations free to vary around a parameter.
Each test of significance has its own formula for determining degrees of freedom (df).
The value for the df is important in determining whether the results are statistically significant.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Selecting Among Tests of Significance
The use of a specific significance test is determined by several factors.
Scale of measurement represented by the data (nominal, ordinal, interval, ratio)
Participant selection
Number of groups being compared
Number of independent variables
Significance tests applied incorrectly can lead to incorrect decisions.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Selecting Among Tests of Significance
The first decision in selecting an appropriate test is to determine whether a parametric or nonparametric test will be used.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Selecting Among Tests of Significance
Parametric tests require that the data meet several assumptions.
Variable must be normally distributed
Interval or ratio scale of measurement
Selection of participants is independent
Variance of the comparison groups is equal
Most parametric tests are fairly robust.
If assumptions are violated, nonparametric tests should be used.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Selecting Among Tests of Significance
The t test
The t test is used to determine whether two groups of scores are significantly different from one another.
The t test compares the observed difference between means with the difference expected by chance.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Selecting Among Tests of Significance
The t test for independent samples is a parametric test of significance used to determine if differences exist between the means of two independent samples.
Independent samples are randomly formed.
The assumption is that the means are the same at the outset of the study but there may be differences between the groups after treatment.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Selecting Among Tests of Significance
The t test for nonindependent samples is a parametric test of significance used to determine if differences exist between the means of two groups that are formed through matching.
When scores are nonindependent, they are systematically related.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Selecting Among Tests of Significance
The comparison of gain or difference scores are not generally tested with a t-test.
There are other better strategies for analyzing such data.
e.g., t test on posttest scores (if there are no differences on pretest scores).
e.g., Analysis of covariance (if there are differences in pretest scores).
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Selecting Among Tests of Significance
Analysis of variance
A simple (one-way) analysis of variance (ANOVA) is a parametric test used to determine whether scores from two or more groups are significantly different at a selected probability level.
ANOVA is used to avoid the error rate problems of conducting multiple tests.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Analysis of Variance
An F ratio is computed to determine if sample means are significantly different.
The F ratio is calculated based upon variance between groups/variance within groups.
The larger the F ratio the more likely there are differences among groups.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Analysis of Variance
If there are significant differences among groups based upon an ANOVA; the researcher then must determine where these differences exist.
Multiple comparisons are used to determine where differences between groups are statistically significant.
Comparisons planned before collecting data are referred to as a priori.
Comparisons after are referred to as post hoc.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Analysis of Variance
When a factorial design is used and there are two or more independent variables analyzed, a factorial or multifactor analysis of variance is used to analyze the data.
MANOVA is an analytic procedure used when there is more than one dependent variable and multiple independent variables.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Analysis of Variance
Analysis of covariance (ANCOVA) is a form of ANOVA that allows for control of extraneous variables and also is used as a means for increasing power of an analysis.
Power is increased in an ANCOVA because the within-group error variance is decreased.
When a study has two or more dependent variables, and a covariate, MANCOVA is used.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Multiple Regression
Multiple regression is used to determine the amount of variance accounted for in a dependent variable by interval and ratio level independent variables.
Multiple regression combines variables that are known to predict the criterion variable into an equation.
Stepwise regression allows the researcher to enter one variable at a time.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Multiple Regression
Multiple regression is also the basis for path analysis.
Path analysis begins with a predictive model.
Path analysis determines the degree to which predictor variables interact with each other and contribute to variance in the dependent variables.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Chi Square
Chi square (χ2) is a nonparametric test used to test differences between groups when the data are frequency counts or percentages or proportions converted into frequencies.
A true category is one in which persons naturally fall.
An artificial category is one that is defined by the researcher.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Other Investigative Techniques
Data mining uses analytical tools to identify and predict patterns in datasets.
Factor analysis is a statistical procedure used to identify relations among variables in a correlation matrix.
Factor analysis is often used to reduce instruments to scales or subscales.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
Other Statistical Procedures
Structural Equation Modeling (SEM)
Structural equation modeling is a combination of path analysis and factor analysis.
SEM is a powerful analytic tool.
Gay & Mills
Educational Research, 11e
© 2016 Pearson Education, Inc. All rights reserved.
18-‹#›
(
)
=
S
E
S
D
N
X
-
1