Assgn 5

ChptOutlines25May2018.zip

Home >Mathematics homework help >Statistics homework help >Assgn 5

Chapter6/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 6 – Selecting and Interpreting Inferential Statistics Study Guide

OBJECTIVES: The student will be able to:

1. Identify the general design classification for difference research questions. 2. Explain the distinctions of within subjects design versus between groups design

classifications. 3. Utilize a decision tree (Figure 6.1) to guide the selection of appropriate inferential

statistics (Tables 6.1-6.4). a. Identify the research problem. b. Identify the variables and their level of measurement. c. Select appropriate inferential statistic.

4. Describe the relationship between difference and associational inferential statistics as a function of the general linear model.

5. Interpret the results of a statistical test. a. Determine whether to reject the null hypothesis. b. Determine the direction of the effect. c. Evaluate the size of the effect.

6. Discuss the relationship between statistical significance and practical significance. TERMINOLOGY: • variables • levels of measurement • descriptive statistics • inferential statistics

o difference inferential statistics o associational inferential statistics

• difference question designs • between group designs • within subjects design (repeated measures design) • single factor designs • between groups factorial designs • mixed factorial designs • basic (bivariate) statistics

o phi or Cramer’s V o eta o Pearson product moment correlation o Kendall’s tau or Spearman rho

• complex statistics o factorial ANOVA o multiple regression o discriminant analysis o logistic regression

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

o MANOVA o ANCOVA

• loglinear • general linear model • statistical significance

o critical value o calculated value o statistically significant o Sig.

• practical significance • effect size

o r family of effect size measures o d family of effect size measures

• confidence intervals ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter6/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 6 – Selecting and Interpreting Statistics Chapter Outline

I. General Design Classifications for Difference Questions

A. Labeling difference question designs. 1. State overall type of design (e.g. between groups, within

subjects). 2. State the number of independent variables. 3. State the number of levels within each independent variable.

B. Between groups designs: each participant in the research is in only one condition or group.

C. Within subjects or repeated measures designs 1. Within subjects designs.

a. Each participant in the research receives or experiences all of the conditions or levels of the independent variable.

b. Also includes designs where participants are matched (e.g. parent & child; husband & wife).

2. Repeated measures designs: each participant is assessed more than once (e.g. pretest & posttest).

D. Single factor (one-way) design 1. Has only one independent variable. 2. Factor and way are other terms for group difference independent

variables. E. Between groups factorial design

1. When there is more than one group difference independent variable.

2. Each level of one factor (independent variable) is possible in combination with each level of the other factor(s).

a. The number of levels of each factor is used in the description of the design.

b. For example: a design that includes gender (2 levels) and ethnicity (4 levels) would be labeled as a 2 x 3 between groups factorial design.

F. Mixed factorial design: Has both a between groups independent variable and a within subjects independent variable.

G. Describing designs 1. Each independent variable is described using one number that

represents the number of levels for that variable. 2. Example: 3 x 4 between groups factorial design would have 2

independent variables, one with 3 levels and one with 4 levels. II. Selection of Inferential Statistics

A. Types of research questions. 1. Difference questions: compare groups and utilize difference

inferential statistics. (Tables 6.1 & 6.3) a. Basic (bivariate) statistics: one independent and one

dependent variable.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

b. Complex statistics: three or more variables. 2. Associational questions: examine the association or relationship

between two or more variables and utilize associational inferential statistics (Tables 6.2 & 6.4).

B. Using Tables 6.1 and 6.4 to Select Inferential Statistics 1. Decide the number of variables.

a. 2 variables = Tables 6.1 or 6.2 b. 3 or more variables = Tables 6.3, 6.4 or 6.5

(Basic 2 variable Questions and Statistics) 2. If there are two variables and the independent variable is nominal

or has 2-4 levels = Table 6.1. a. Identify number of levels of IV. b. Identify type of research design (between or within). c. Determine the type of measurement for the DV.

3. If there are 2 variables and both are nominal use the bottom rows of Table 6.1 (difference question) or Table 6.2 (associational question).

4. If there are 2 variables and both variables have 5 or more ordered levels use Table 6.2 (associational question).

(Complex Questions and Statistics-3 or more variables) 5. If there is one normal/scale DV and the IV’s (2 or more) are

nominal or have a few ordered levels use Table 6.3. 6. If there is one normal/scale DV and the IV’s/predictors (2 or

more) are normal/scale or dichotomous use the top row of Table 6.4 (complex associational question).

7. If there is one DV that is nominal or dichotomous and there are 2 or more IV’s use the bottom row of Table 6.4 (or 6.3).

8. If there are 2 are more normal (scale) DV’s use the general linear model to do MANOVA.

III. The General Linear Model (GLM) A. Difference between associational and difference questions.

1. Mathematically, the distinction between associational and difference questions is artificial.

2. Both associational and difference inferential statistics serve the purpose of exploring and describing relationships (Fig. 6.2).

a. The GLM subsumes both associational and difference inferential statistics.

b. The relationship between the IV and DV can be expressed by an equation with weights for each of the independent/predictor variables plus an error term.

IV. Interpreting the Results of a Statistical Test A. Statistical Significance

1. The SPSS calculated value is compared to a critical value found in a statistics table.

2. Statistically significant: probability (p) is less than the preset alpha (usually .05).

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

a. Sig.: SPSS label for the p value. b. Usually, if the calculated value (t, F, etc.) is large, the

probability (p) is small. c. This Sig. is also the probability of committing a Type I

error (rejecting the null hypothesis when it is actually true). 3. The p and the null hypothesis

a. p > .05: don’t reject the null hypothesis; results are not statistically significant and could be due to chance.

b. p < .05: reject the null hypothesis; results are statistically significant and are not likely due to chance.

B. Practical Significance versus Statistical Significance 1. Statistical significance does not necessarily insure that the results

have practical significance or are important. 2. Effect size and/or confidence intervals must be examined to

determine the strength of association. a. It is possible, with a large sample, to have a statistically

significant result that is weak (small effect size). b. Small effect size may indicate that the difference or

association is of little practical importance. C. Confidence Intervals

1. An alternative to null hypothesis significance testing (NHST). 2. May provide more practical information than NHST. 3. Confidence intervals allow us to determine the interval that

contains population mean difference 95% of the time. D. Effect Size

1. The strength of the relationship between the independent variable and the dependent variable.

2. r family of effect size measures a. Pearson correlation coefficient (r): values range from –1.0

to +1.0 (0 = no effect and +1/-1 =maximum effect). b. Also includes other associational statistics such as rho, phi,

eta and the multiple correlation (R). c. Can be reported as a squared or unsquared value.

i. Squared values (r2) indicate the percent of variance of the DV that can be predicted from the IV, but give small numbers that give an underestimated impression of the strength or importance of the effect.

ii. Unsquared values (r) give a larger value and are recommended for r family indices.

3. d family of effect size measures a. Focuses on the magnitude of the difference rather than the

strength of the association. b. Computed by subtracting the mean of the second group

from the mean of the first group and dividing by the pooled standard deviation of both groups.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

c. All d family effect sizes express effect sizes in standard deviation units.

d. Values usually vary from 0 to +/- 1.0, but can be > 1.0. 4. Issues about effect size measures.

a. d is not available on SPSS outputs but can be calculated from information provided on SPSS outputs.

b. r and R are available on SPSS outputs. c. Most journals now expect authors to discuss the effect size

as well as statistical significance. E. Interpreting Effect Sizes

1. Table 6.5 provides guidelines for the interpretation of effect sizes based upon the effect sizes usually found in the behavioral sciences and education.

2. The absolute meaning of large, medium, and small are relative to findings in these disciplines. Suggest using the following terms instead:

a. Minimal in place of small. b. Typical in place of medium. c. Substantial in place of large.

3. Cohen’s (1998) examples of effect size: a. Small = “difficult to detect”. b. Medium = “visible to the naked eye”. c. Large = “grossly perceptible”.

4. Effect size is not the same as practical significance. a. Effect size indicates the strength of the relationship and is

more relevant to practical significance than statistical significance.

b. However, effect size measures are not direct indexes of the importance of a finding.

V. An Example of How to Select and Interpret Inferential Statistics A. Steps in the process:

1. Identify the research problem. 2. Identify the variables and their level of measurement. 3. State the research question(s). 4. Identify the type of each research question. 5. Select an appropriate statistic. 6. Interpret the results of the statistic.

a. Determine if the results were statistically significant. b. If the results are statistically significant:

i. Determine the direction of the effect. ii. Calculate and interpret the effect size.

iii. If necessary, calculate and interpret confidence intervals to evaluate practical significance.

VI. Writing About Your Outputs A. Methods Chapter

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

1. Update methods to include descriptive statistics about the demographics of the participants.

2. Add literature based evidence about the reliability and validity of measures/instruments.

3. Discuss if statistical assumptions were violated or not. B. Results Chapter

1. Includes a description of the findings. 2. Include figures and tables to illustrate the findings. 3. Do not include a discussion of the findings in this section. 4. Results of statistics should include:

a. The value of the statistic (e.g. t = 2.05) b. The degrees of freedom (and N for chi-square) c. The p or Sig. Value (e.g. p = .048)

C. Discussion Chapter 1. Puts the findings in context to research literature, theory and the

purposes of the study. 2. Explain why the results turned out the way they did.

Chapter1/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 1 - Variables, Research Problems and Questions Study Guide

OBJECTIVES: The student will be able to:

1. Explain the difference between research problems, research hypotheses, and research questions.

2. Provide definitions for different types of variables. 3. Identify the research question, research hypothesis, and types of variables used in a study. 4. Determine if a research question is a difference research question, an associational

research question, or a descriptive research question. 5. Explain the relationship between the type of independent variable used in a study and the

type of research question that can be answered (difference, associational, descriptive). 6. Discuss how the type of research questions drives the selection of the type of statistic. 7. Utilize the SPSS data editor and variable view features to examine the variables of an

existing dataset. TERMINOLOGY:

• research problem • variable

o independent variable (active vs. attribute) o dependent variable o extraneous variable

• operational definition • randomized experimental study • quasi-experimental study • non-experimental study • factor • grouping variable • values (categories, levels, groups, samples) • variable label • value label • research hypotheses • research question

o difference research question o associational research question o descriptive research question o complex research question (multivariate)

ASSIGNMENTS: See additional activities for assignment examples.

Chapter1/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 1 – Variables, Research Problems and Questions Chapter Outline

I. Research Problems: Statement about the relationships between two or more

variables. II. Variables

A. Definition: Characteristic of the participants or situation for a study 1. Must be able to vary or have different values. 2. Concepts that do not vary are called constants. 3. Operational definition: defines a variable in terms of the

operations or techniques used to measure it or make it happen. B. Independent Variables

1. Active (manipulated) independent variable: can be given to participants within a specified period of time during the study.

a. Are not necessarily manipulated by the experimenter. b. Treatment is always given after the study is planned. c. Randomized experimental & quasi-experimental studies

must have active independent variables. 2. Attribute (measured) independent variable: preexisting attributes

of the persons or their ongoing environment. a. Cannot be manipulated by the experimenter. b. Non-experimental studies have attribute independent

variables. 3. Other terms for independent variables:

a. factor b. grouping variable

4. Inferences about cause and effect: a. Designs with active independent variables (experimental,

quasi-experimental) can provide data to infer that the independent variable caused the change or difference in the dependent variable.

b. Designs with attribute independent variables (non- experimental) should not be used to conclude a cause and effect relationship between the independent variable and the dependent variable.

5. Values of the independent variable: a. Several options or values of a variable. b. Also called: categories, levels, groups, samples

C. Dependent Variables 1. Presumed outcome or criterion that is supposed to measure or

assess the effect of the independent variable. 2. Must have at least two values, but usually have many values that

vary from high to low.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

D. Extraneous Variables 1. Not of interest in a particular study but could influence the

dependent variable. 2. May also be called nuisance variables or covariates.

III. Research Hypothesis and Questions A. Research hypothesis: predictive statements about the relationship between

variables. B. Research questions: similar to hypotheses, but do not make specific

predictions. 1. Difference research questions: compare two or more different

groups on the dependent variable a. Utilize difference inferential statistics (e.g. ANOVA or t-

test) 2. Associational research questions: find the strength of association

between variables or to make predictions about a variable from one or more variables.

a. Utilize associational inferential statistics (e.g. correlation, multiple regression)

3. Descriptive research questions: summarize or describe data without trying to generalize to a larger population of individuals.

4. Complex research questions: involve more than two variables at a time.

a. Utilize complex inferential statistics. b. May be called multivariate in some books.

IV. Sample Research Problem: The Modified High School and Beyond (HSB) Study A. Research Problem: What factors influence mathematics achievement?

1. Identify primary dependent variable 2. Identify independent and extraneous variables 3. Identify types of independent variables (active vs. attribute) 4. Identify the research approach (experimental, quasi-

experimental, non-experimental) B. SPSS Variable View

1. Columns give information on database variables a. Name shows the variable name b. Label gives a longer description of the variable c. Values shows assigned value labels d. Missing identifies if certain values are designated by user

for missing values C. SPSS Data Editor

1. Shows raw data a. Variables are across the top (identified by short variable

names) b. Participants are listed down the left side.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

D. Research Questions for the Modified HSB Study 1. Descriptive questions (Chapter 4) 2. To examine continuous variables for normality (Chapter 4). 3. Determine relationships between two categorical variables with

crosstabulations (Chapter 8). 4. Associational questions (Chapter 9) 5. Complex associational questions (Chapter 9) 6. Basic difference questions (Chapter 10) 7. Complex difference questions (Chapter 11)

III. Research Hypothesis and Questions
IV. Sample Research Problem: The Modified High School and Beyond (HSB) Study

Chapter2/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 2 – Data Coding, Entry, and Checking Study Guide

OBJECTIVES: The student will be able to:

1. Describe the steps necessary to plan, pilot test and collect data. 2. Prepare data for entry into SPSS or a spreadsheet 3. Define and label variables. 4. Display your SPSS codebook (dictionary). 5. Enter data into SPSS or a spreadsheet. 6. Check accuracy of data entry using SPSS Descriptive Statistics.

TERMINOLOGY: • pilot study • content validity • coding • dummy coding • codebook • define variables • label variables • missing values • data entry form • descriptive statistics ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter2/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 2 – Data Coding, Entry, and Checking Chapter Outline

I. Plan the Study, Pilot Test, and Collect Data

A. Plan the study 1. Identify the research problem, question and hypothesis. 2. Plan the research design.

B. Select or develop the instrument(s) 1. Select from available instruments 2. Modify available instruments 3. Develop your own instruments

C. Pilot test and refine the instruments 1. Try out instrument on friends or colleagues 2. Conduct pilot study with a similar sample population 3. Utilize experts to check content validity of instrument items

D. Collect the data 1. Use methods appropriate for selected instruments 2. Check raw data before entering 3. Set “rules” for dealing with problematic responses.

II. Code Data for Data Entry A. Rules for data coding (assigning numbers to values or levels of a variable)

1. All data should be numeric. 2. Each variable for each case or participant must occupy the same

column in the SPSS Data Editor. 3. All values (codes) for a variable must be mutually exclusive. 4. Each variable should be coded to obtain maximum information. 5. For each participant, there must be a code or value for each

variable. 6. Apply any coding rules consistently for all participants. 7. Use high numbers (value or code) for the “agree”, “good”, or

“positive” end of a variable that is ordered. B. Make a coding form: to streamline data entry processes

III. Problem 2.1: Check the Completed Questionnaires (follow instructions in book)

IV. Problem 2.2: Define and Label the Variables (follow instructions in book)

V. Problem 2.3: Display Your Dictionary or Codebook (follow instructions in book)

VI. Problem 2.4: Enter Data (follow instructions in book)

VII. Problem 2.5: Run Descriptives and Check the Data (follow instructions in book)

I. Plan the Study, Pilot Test, and Collect Data
II. Code Data for Data Entry

A. Rules for data coding (assigning numbers to values or levels of a variable)
B. Make a coding form: to streamline data entry processes

III. Problem 2.1: Check the Completed Questionnaires (follow instructions in book)
IV. Problem 2.2: Define and Label the Variables (follow instructions in book)
V. Problem 2.3: Display Your Dictionary or Codebook (follow instructions in book)
VI. Problem 2.4: Enter Data (follow instructions in book)
VII. Problem 2.5: Run Descriptives and Check the Data (follow instructions in book)

Chapter2/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 2 – Data Coding, Entry, and Checking Using the college student data.sav file, from http://www.psypress.com/ibm-spss-intro- stats/ (“Data Sets (ZIPS)” button) or the Moodle Web site for this book, do the following problems. Print your outputs and circle the key parts for discussion. 1. Compute the N, minimum, maximum, and mean, for all the variables in the college

student data file. How many students have complete data? Identify any statistics on the output that are not meaningful. Explain.

There are 47 students who have complete data. This value is found by looking at the value given for the Valid N (listwise). The mean is not meaningful for nominal (unordered) variables. In this example, nominal variables include: gender of student, marital status, and age group. The mean for dichotomous variables coded as 0 and 1 can be meaningful because the means actually tell the percent of students that answered with a “1” on their survey. In this example, the following variables are dichotomous: does subject have children, television shows-sitcoms, television shows-movies, television shows- sports, television shows-news.

2. What is the mean height of the students? What about the average height of the same

sex parent? What percentage of students are males? What percentage have children?

Mean height of the students = 67.30 inches Average height of same sex parent = 66.78 inches Percentage of students that are male = 52.0% Percentage of students with children = 52.0%

Chapter3/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 3 – Measurement and Descriptive Statistics Study Guide

OBJECTIVES: The student will be able to:

1. Utilize frequency distributions to determine if data is normally distributed. 2. Define the various levels of measurement (nominal, ordinal, interval, ratio, etc.) and

recognize terms that are used interchangeably. 3. Distinguish between the types of measurement (e.g. nominal vs. ordered, ordinal vs.

normal). 4. Utilize SPSS to generate descriptive statistics (frequency distributions, measures of

central tendency, measures of variability) for a data set. 5. Select the appropriate descriptive statistics based upon the level of measurement of the

data. 6. Describe the difference between parametric and non-parametric statistics. 7. Describe the properties of the normal curve. 8. Determine whether data is normally distributed and describe types of non-normality

exhibited (skewness, kurtosis, etc.). 9. Explain the relationship between the area under the normal curve and probability

distributions. 10. Explain the purpose of converting data to a standard normal curve and generating z-

scores. TERMINOLOGY: • frequency distribution

o approximately normally distributed o not normally distributed o negatively skewed o positively skewed

• levels of measurement o nominal (categorical, qualitative, discrete) o dichotomous o ordinal (ranks) o interval o ratio o scale o approximately normal (continuous, dimensional, quantitative)

• descriptive statistics o frequency tables o bar charts o histograms o frequency polygons

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

o box and whiskers plot • measures of central tendency

o mean o median o mode

• measures of variability o range o minimum o maximum o standard deviation o skewness o kurtosis o interquartile range

• parametric vs. nonparametric statistics • power • normal curve

o area under the normal curve o standard normal curve o z scores

• kurtosis ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter3/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 3 – Measurement and Descriptive Statistics Chapter Outline

I. Frequency Distributions

A. Definition: tally of the number of times each score on a single variable occurs.

B. Approximately normally distributed: there is a small number of scores for the low and high values and most of the scores occur in the middle values (distribution exhibits a “normal curve”).

C. Not normally distributed: distribution does not exhibit a normal curve. 1. Negatively skewed: tail of the curve (extreme scores) is

elongated on the low end (left side). 2. Positively skewed: tail of the curve (extreme scores) is elongated

on the high end (right side). II. Levels of Measurement

A. Measurement: the assignment of numbers or symbols to different characteristics (values) of the variables.

B. Nominal Variables: numerals assigned to each category stand for a name of category.

1. Categories have no implied order or value. 2. Categories are distinct and non-overlapping. 3. Other terms for nominal variables:

a. Categorical b. Qualitative c. Discrete

C. Dichotomous Variables: have only two levels or categories. 1. May or may not have an implied order 2. Other terms for dichotomous variables:

a. dummy variables b. discrete variables c. categorical variables

D. Ordinal Variables: mutually exclusive categories that are ordered from low to high, but the intervals between categories may not be equal.

1. Also includes ordered variables with only a few categories (2-4) 2. Distribution of the scores is not normally distributed. 3. Other terms for ordinal variables:

a. Ranks b. Categorical

E. Approximately Normal (or Scale) Variables: levels or scores are ordered from low to high and the frequencies of the scores are approximately normally distributed.

1. May be continuous (have an infinite number of possible values within a range).

2. If not continuous, should have at least five ordered values or levels.

3. Other terms for approximately normal variables:

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

a. interval – have ordered categories that are equally spaced b. ratio – have ordered categories that are equally spaced and

have a true zero c. continuous d. dimensional e. quantitative

F. How to Distinguish Between the Types of Measurement 1. Nominal versus ordinal variables:

a. Only two levels = treat as nominal in SPSS b. Three or more categories and not ordered = nominal c. Three or more categories and ordered = ordinal

2. Ordinal versus normal (scale) variables: a. Five or more ordered levels with equal intervals and

approximately normal distribution = normal b. Three or more ordered levels with unequal intervals and not

normally distributed = ordinal III. Descriptive Statistics

A. Frequency Tables: tabulates the number of occurrences of each level of a variable as well as the number of missing values; also calculates the valid percent and cumulative percent for each level.

1. Nominal data: order of categories in table is arbitrary; cumulative percent column is not useful

2. Ordinal or approximately normal data: order of categories in tables is shown from low to high; cumulative percent column is useful.

B. Bar Charts: creates discrete (not connected) columns to illustrate the frequency distribution; appropriate for nominal data.

C. Histograms: similar to a bar chart, but there are no spaces between the bars which indicates a continuous variable underlying the scores.

D. Frequency Polygons: connects points between the categories; best used with approximately normal data (but can be used with ordinal data).

E. Box and Whiskers Plot: useful for ordinal and normal data; gives a graphical representation of the distribution of scores.

1. Box: middle 50% of cases (those between the 25th and 75th percentiles)

2. Whiskers: represent the expected range of scores. 3. Outliers: scores that fall outside the box and whiskers.

F. Measures of Central Tendency 1. Mean: the arithmetic average; statistic of choice for normally

distributed data. 2. Median: the middle score; appropriate measure for ordinal data

or data that is skewed. 3. Mode: the most common category; can be used with any type of

data, but is the least precise information about central tendency. G. Measures of Variability: tells about the spread or dispersion of scores.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

1. Range: highest score minus the lowest score; does not give an indication of spread of scores for ordered data.

2. Standard Deviation: most common measure of variability; based upon the deviation of each score from the mean of all scores; most appropriate for normally distributed data.

3. Interquartile Range: the distance between the 25th and 75th percentiles (as shown in the box plot); appropriate for ordinal data.

4. Nominal Data: variability measures are not appropriate; rather look at the number of categories and the frequency counts.

H. Conclusions About Measurement and the Use of Statistics 1. Normal data: utilize means and standard deviations for

parametric statistics. 2. Ordinal data: utilize median and nonparametric tests. 3. Nominal data: utilize mode or count.

IV. The Normal Curve A. Properties of the Normal Curve: the normal curve is theoretically formed

by counting an “infinite” number of occurrences of a variable. 1. Unimodal – the distribution has one hump which is in the middle

of the distribution. 2. The mean, median and mode are equal. 3. The curve is symmetric (not skewed). 4. The range is infinite (the extremes never touch the X axis). 5. The curve is not too peaked or too flat and is neither too short

nor too long (does not exhibit kurtosis). B. Non-Normally Shaped Distributions

1. Skewness: one tail of the frequency distribution is longer than the other.

2. Mean and median are different. C. Kurtosis

1. Refers to the shape of the curve. 2. Leptokurtic (positive kurtosis): frequency distribution is more

peaked than normal. 3. Platykurtic (negative kurtosis): frequency distribution is flatter

than normal. D. Area Under the Normal Curve (Figure 3.10)

1. The normal curve is a probability distribution whose area is equal to 1.0 and portions of the curve are fractions of 1.0.

2. Areas of the curves can be divided in terms of standard deviations.

a. 34% of area under the normal curve is between the mean and 1 standard deviation above or below the mean (thus, 68% of the area under the normal curve is within 1 standard deviation to the left and right of the mean).

b. 13.5% of the area under the normal curve is accounted for by adding a second standard deviation to the first (thus,

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

95% of the area under the normal is within 2 standard deviations to the left and right of the mean).

c. 5% of the area under the normal curve falls beyond 2 standard deviations to the left and right of the mean (thus, this is why values not falling within 2 standard deviations of the mean are seen as relatively rare events).

E. The Standard Normal Curve 1. A normal curve converted so the mean is equal to 0 and the

standard deviation is equal to 1. 2. This conversion allows comparison of normal curves with

different means and standard deviations. 3. z scores = units of the standard normal distribution

a. standard scores = term for raw scores that are converted to the standard normal curve.

Chapter3/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 3 – Measurement and Description Statistics Use the hsbdata.sav file from http://www.psypress.com/ibm-spss-intro-stats/ (“Data Sets (ZIPS)” button) to do these problems with one or more of these variables: math achievement, mother’s education, ethnicity, and gender. Use Tables 3.2, 3.3, and the instructions in the text to produce the appropriate plots or descriptive statistics. Be sure that the plots and/or descriptive statistics make sense (i.e. that they are a “good choice” or “OK”) for the variable. 3.1 Create bar charts. Discuss why you did or didn’t create each.

• Select Analyze => Descriptive Statistics => Frequencies. • Move math achievement, mother’s education, ethnicity, and gender into the

Variables box. • Select Charts => Bar Charts => Continue => OK.

Bar charts can be used with any of the four levels of measurements, but it is better to use frequency polygons or histograms if you have normally distributed data. Each of these types of plots displays the frequency or number of subjects on the Y or vertical axis and shows the levels or values of the variables on the X axis of the plot. In histograms and frequency polygons the bars or points are connected implying that the levels of the variable are ordered from low to high. In a bar chart the bars are separated implying that there might not be an order to the levels or categories of the variable. 3.3 Create Frequency polygons. Discuss why you did or didn’t create each. Compare

the plots in 3.1, 3.2, and 3.3.

• Select Graphs => Line. Click Simple and Summaries for groups of cases • Click Define. • Move math achievement into the Category Axis box. => OK. • Repeat the steps above, except this time instead of moving math achievement, move

mother’s education in the Category Axis box. => OK. Frequency polygons and histograms are similar. They are designed for normally distributed data but are okay to use with ordinal variables. A frequency polygon connects the midpoints of the top of each bar in a histogram. In other words, you can make a frequency polygon from a histogram by taking a straight edge and connecting the middle of each of the bars. 3.5 Compute the mean, median, and mode. Discuss which measures of central

tendency are meaningful for each of the four variables.

• Select Analyze => Descriptive Statistics => Frequencies. • Move the four variables into the Variables box. • Statistics => Mean, Median, Mode => Continue => OK.

Although the mean, median, and mode are okay to use with ordinal or normal data, the mean is the most appropriate with normal data and the median is best with ordinal data.

http://www.psypress.com/ibm-spss-intro-stats/

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Neither the mean nor the median are not meaningful with nominal data. If you ask SPSS to compute a mean or median for ethnicity, it will do so, but because the ethnic categories are not in any order, the result would not be interpretable. The mode would tell you which ethnic group was the largest. Similarly, the mode (and median) tell you which level of a dichotomous variable is most frequent. The mean of a dichotomous variable (e.g., gender) is the percent of participants who have the higher value (i.e., female, in this case).

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Fig. E.3

Fig. E.4

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Ch.3 Output 2.0 Frequencies

Statistics

math achievement test mother's education ethnicity gender

N Valid 75 75 73 75

Missing 0 0 2 0

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Frequency Table

math achievement test

Frequency Percent Valid Percent

Cumulative

Percent

Valid -1.67 1 1.3 1.3 1.3

1.00 2 2.7 2.7 4.0

2.33 1 1.3 1.3 5.3

3.67 3 4.0 4.0 9.3

4.00 2 2.7 2.7 12.0

5.00 5 6.7 6.7 18.7

5.33 1 1.3 1.3 20.0

6.33 2 2.7 2.7 22.7

6.67 1 1.3 1.3 24.0

7.67 4 5.3 5.3 29.3

8.00 1 1.3 1.3 30.7

9.00 4 5.3 5.3 36.0

9.33 1 1.3 1.3 37.3

10.33 4 5.3 5.3 42.7

10.67 1 1.3 1.3 44.0

11.67 2 2.7 2.7 46.7

12.00 2 2.7 2.7 49.3

13.00 3 4.0 4.0 53.3

14.33 9 12.0 12.0 65.3

14.67 1 1.3 1.3 66.7

15.67 2 2.7 2.7 69.3

17.00 5 6.7 6.7 76.0

18.33 1 1.3 1.3 77.3

18.67 1 1.3 1.3 78.7

19.67 3 4.0 4.0 82.7

20.33 1 1.3 1.3 84.0

21.00 3 4.0 4.0 88.0

22.33 2 2.7 2.7 90.7

22.67 1 1.3 1.3 92.0

23.67 6 8.0 8.0 100.0

Total 75 100.0 100.0

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

mother's education

Frequency Percent Valid Percent

Cumulative

Percent

Valid < h.s. 17 22.7 22.7 22.7

h.s. grad 31 41.3 41.3 64.0

< 2 yrs voc 2 2.7 2.7 66.7

2 yrs voc 5 6.7 6.7 73.3

< 2 yrs coll 7 9.3 9.3 82.7

> 2 yrs coll 5 6.7 6.7 89.3

coll grad 3 4.0 4.0 93.3

master's 3 4.0 4.0 97.3

MD/PhD 2 2.7 2.7 100.0

Total 75 100.0 100.0

ethnicity

Frequency Percent Valid Percent

Cumulative

Percent

Valid Euro-Amer 41 54.7 56.2 56.2

African-Amer 15 20.0 20.5 76.7

Latino-Amer 10 13.3 13.7 90.4

Asian-Amer 7 9.3 9.6 100.0

Total 73 97.3 100.0 Missing multiethnic 1 1.3

blank 1 1.3 Total 2 2.7

Total 75 100.0

gender

Frequency Percent Valid Percent

Cumulative

Percent

Valid male 34 45.3 45.3 45.3

female 41 54.7 54.7 100.0

Total 75 100.0 100.0

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Bar Charts

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Fig. E.5

Fig. E.6

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Ch. 3 Output 3.3 GRAPH /LINE(SIMPLE)= COUNT BY mathach.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

GRAPH /LINE(SIMPLE)= COUNT BY maed.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Fig. E.7 Ch. 3 Output 1.1 FREQUENCIES VARIABLES=mathach maed ethnic gender /STATISTICS= MEAN MEDIAN MODE /ORDER= ANALYSIS Frequencies

Statistics

math

achievement

test

mother's

education ethnicity gender

N Valid 75 75 73 75

Missing 0 0 2 0

Mean 12.5645 4.11 1.77 .55

Median 13.0000 3.00 1.00 1.00

Mode 14.33 3 1 1

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Frequency Table

math achievement test

Frequency Percent Valid Percent

Cumulative

Percent

Valid -1.67 1 1.3 1.3 1.3

1.00 2 2.7 2.7 4.0

2.33 1 1.3 1.3 5.3

3.67 3 4.0 4.0 9.3

4.00 2 2.7 2.7 12.0

5.00 5 6.7 6.7 18.7

5.33 1 1.3 1.3 20.0

6.33 2 2.7 2.7 22.7

6.67 1 1.3 1.3 24.0

7.67 4 5.3 5.3 29.3

8.00 1 1.3 1.3 30.7

9.00 4 5.3 5.3 36.0

9.33 1 1.3 1.3 37.3

10.33 4 5.3 5.3 42.7

10.67 1 1.3 1.3 44.0

11.67 2 2.7 2.7 46.7

12.00 2 2.7 2.7 49.3

13.00 3 4.0 4.0 53.3

14.33 9 12.0 12.0 65.3

14.67 1 1.3 1.3 66.7

15.67 2 2.7 2.7 69.3

17.00 5 6.7 6.7 76.0

18.33 1 1.3 1.3 77.3

18.67 1 1.3 1.3 78.7

19.67 3 4.0 4.0 82.7

20.33 1 1.3 1.3 84.0

21.00 3 4.0 4.0 88.0

22.33 2 2.7 2.7 90.7

22.67 1 1.3 1.3 92.0

23.67 6 8.0 8.0 100.0

Total 75 100.0 100.0

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

mother's education

Frequency Percent Valid Percent

Cumulative

Percent

Valid < h.s. 17 22.7 22.7 22.7

h.s. grad 31 41.3 41.3 64.0

< 2 yrs voc 2 2.7 2.7 66.7

2 yrs voc 5 6.7 6.7 73.3

< 2 yrs coll 7 9.3 9.3 82.7

> 2 yrs coll 5 6.7 6.7 89.3

coll grad 3 4.0 4.0 93.3

master's 3 4.0 4.0 97.3

MD/PhD 2 2.7 2.7 100.0

Total 75 100.0 100.0

ethnicity

Frequency Percent Valid Percent

Cumulative

Percent

Valid Euro-Amer 41 54.7 56.2 56.2

African-Amer 15 20.0 20.5 76.7

Latino-Amer 10 13.3 13.7 90.4

Asian-Amer 7 9.3 9.6 100.0

Total 73 97.3 100.0 Missing multiethnic 1 1.3

blank 1 1.3 Total 2 2.7

Total 75 100.0

gender

Frequency Percent Valid Percent

Cumulative

Percent

Valid male 34 45.3 45.3 45.3

female 41 54.7 54.7 100.0

Total 75 100.0 100.0

Chapter4/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 4 – Understanding Your Data and Checking Assumptions Study Guide

OBJECTIVES: The student will be able to:

1. Describe the purpose of Exploratory Data Analysis (EDA). 2. Explain the purpose of statistical assumptions. 3. Select appropriate types of analyses and plots to conduct EDA based upon the level of

measurement of the variable. 4. Utilize SPSS to conduct EDA. 5. Interpret SPSS output from EDA.

TERMINOLOGY: • exploratory data analysis (EDA) • statistical assumptions

o homogeneity of variances o normality

• parametric tests • robust • non-parametric tests • skewness

o positive skew o negative skew

ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter4/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 4 – Understanding Your Data and Checking Assumptions Chapter Outline

I. Exploratory Data Analysis (EDA)

A. What is EDA? 1. The first step to complete after entering data and before running

any inferential statistics. 2. Computing various descriptive statistics and graphs in order to

examine your data. a. Look for data errors, outliers, non-normal distributions, etc. b. Determine if the data meets the assumptions of the statistics

you plan to use. c. Gather basic demographic information about the subjects. d. Examine relationships between the variables to determine

how to conduct the hypothesis testing. B. How to do EDA

1. Generate plots of the data 2. Generate numbers from the data.

C. Check for Errors 1. Examine raw data before entering. 2. Compare some raw data against entered data. 3. Compare maximum and minimum values against the allowable

ranges. 4. Examine the means and standard deviations to see if they seem

reasonable. 5. Look to see if there is an unreasonable amount of missing data. 6. Look for outliers.

D. Statistical Assumptions: explain when it is and isn’t reasonable to perform a specific statistical test.

1. Parametric tests a. Usually have more assumptions than nonparametric tests. b. Generally designed for use with data that exhibits

approximately normal distribution c. S.some parametric tests are more robust in dealing with

violations of assumptions than others. 2. Nonparametric tests

a. Have fewer assumptions b. Can often be used when assumptions for parametric tests

are violated. E. Parametric Tests

Chapter4/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 4 – Understanding Your Data and Checking Assumptions Using the College Student data file, do the following problems. Print your outputs and circle the key parts of the output that you discuss. 4.1 For the variables with five or more ordered levels, compute the skewness.

Describe the results. Which variables in the data set are approximately normally distributed/scale? Which ones are ordered but not normal?

• Select Analyze => Descriptive Statistics => Descriptives. • Move student height, same sex parent’s height, amount of tv watched per week,

hours of study per week, student’s current gpa, positive evaluation-institution, positive evaluation-major, positive evaluation-facilites, positive evaluation-social life, hours per week spent working in the Variables box.

• Options => Check Skewness (in addition to Mean, Std. Deviation, Minimum, and Maximum) => Continue => OK.

The Valid N (listwise) for the variables selected is 48. The Means all seem reasonable and within the expected range. The Minimum and Maximum values are all with the expected range, based on the codebook. The N for each variable makes sense and only two variables are missing values (positive evaluation-major and hours per week spent working).

The Skewness Statistic is utilized to determine which of these variables are approximately normally distributed. The guideline is that if the Skewness Statistic is between -1 and 1, the variable is at least approximately normal. In this case, all the variables with five or more ordered levels fall into that range and would be considered approximately normally distributed. For this dataset, the ordinal variables with five or more ordered levels (positive evaluation-institution, positive evaluation-major, positive evaluation-facilities, positive evaluation-social life) are all approximately normally distributed and we can assume they are more like scale variables and we can use inferential statistics that have the assumption of normality with them. None of the variables examined for this problem were not normal. 4.3 Which variables are nominal? Run frequencies for the nominal variables and

other variables with fewer than five levels. Comment on the results.

• Select Analyze => Descriptive Statistics => Frequencies. • Move gender of student, marital status, age group, does subject have children,

television shows-sitcoms, television shows-movies, television shows-sports, television shows-news shows

The table titled Statistics provides the number of participants for whom we have Valid data and the number of Missing data. No other statistics were requested because almost all of them are not appropriate to use with nominal and dichotomous data. Age group has three ordered levels so it is ordinal and the median would be appropriate.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

The other tables are labeled Frequency Table and there is one for each of the variables selected. The left-hand column shows the Valid categories (or levels or values), Missing values, and Total number of participants. The Frequency column gives the number of participants who had each value. The Percent column is the percent who had each value, including missing values. For example, in the marital status table, 40.0% of ALL participants were single, 36.0% were married, 22.0% were divorced, and 2.0% were missing. The Valid Percent shows the percent of those with nonmissing data at each value; e.g. 40.8% of the 49 students with valid data were single. Finally, Cumulative Percent is the percent of the subjects in a category plus the categories listed above it.

Fig. E.8

Fig. E.9

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Ch. 4 Output 4.1 DESCRIPTIVES VARIABLES=height pheight hrstv hrsstudy currgpa evalinst evalprog evalphys evalsocl hrswork /STATISTICS=MEAN STDDEV MIN MAX SKEWNESS.

Descriptives

Fig. E.10

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation Skewness Statistic Statistic Statistic Statistic Statistic Statistic Std. Error

student height in inches 50 60.00 75.00 67.3000 3.93959 .163 .337

same sex parent's height 50 58.00 76.00 66.7800 5.10418 .333 .337

amount of tv watched per

week

50 4 25 11.98 6.096 .645 .337

hours of study per week 50 2 38 15.62 8.310 .950 .337

student's current gpa 50 2.4 4.0 3.172 .3907 .147 .337

positive evaluation,

institution

50 2 5 3.38 .945 .059 .337

positive evaluation, major 49 1 5 3.27 .953 -.115 .340

positive evaluation, facilities 50 1 5 2.76 1.061 -.136 .337

positive eval, social life 50 1 5 3.10 1.182 .031 .337

hours per week spent

working

49 0 50 26.12 14.857 -.516 .340

Valid N (listwise) 48

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Ch. 4 Output 4.3 FREQUENCIES VARIABLES=gender marital age children tvsitcom tvmovies tvsports tvnews /ORDER=ANALYSIS.

Frequencies

Frequency Table

gender of student

Frequency Percent Valid Percent

Cumulative

Percent

Valid males 26 52.0 52.0 52.0

females 24 48.0 48.0 100.0

Total 50 100.0 100.0

marital status

Frequency Percent Valid Percent

Cumulative

Percent

Valid single 20 40.0 40.8 40.8

married 18 36.0 36.7 77.6

divorced 11 22.0 22.4 100.0

Total 49 98.0 100.0 Missing System 1 2.0 Total 50 100.0

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

age group

Frequency Percent Valid Percent

Cumulative

Percent

Valid less than 22 17 34.0 34.0 34.0

22-29 18 36.0 36.0 70.0

30 or more 15 30.0 30.0 100.0

Total 50 100.0 100.0

does subject have children

Frequency Percent Valid Percent

Cumulative

Percent

Valid no 24 48.0 48.0 48.0

yes 26 52.0 52.0 100.0

Total 50 100.0 100.0

television shows-sitcoms

Frequency Percent Valid Percent

Cumulative

Percent

Valid no 18 36.0 36.0 36.0

yes 32 64.0 64.0 100.0

Total 50 100.0 100.0

television shows-movies

Frequency Percent Valid Percent

Cumulative

Percent

Valid no 32 64.0 64.0 64.0

yes 18 36.0 36.0 100.0

Total 50 100.0 100.0

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

television shows-sports

Frequency Percent Valid Percent

Cumulative

Percent

Valid no 24 48.0 48.0 48.0

yes 26 52.0 52.0 100.0

Total 50 100.0 100.0

television shows-news shows

Frequency Percent Valid Percent

Cumulative

Percent

Valid no 27 54.0 54.0 54.0

yes 23 46.0 46.0 100.0

Total 50 100.0 100.0

Chapter5/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 5 – Data File Management Study Guide

OBJECTIVES: The student will be able to:

1. Explain why data transformations might be necessary. 2. Count data. 3. Recode and relabel data. 4. Compute scale scores using either the numeric expression or function features of the

SPSS Compute Variable command. 5. Check transformed data for errors and normality.

TERMINOLOGY: • data transformation • file management • summated variable (composite variable, scale score) • count • recode • reverse code • relabel • compute ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter5/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 5 – Data File Management Chapter Outline

I. Problem 5.1: Count Math Courses Taken

A. Follow the directions in the book to use the Count command to determine how many math courses the participants took.

II. Problem 5.2: Recode and Relabel Mother’s and Father’s Education A. Recode is useful to either reduce the number of levels of a variable or to

combine two or small groups or categories of a variable. B. Follow the directions in the book to use the Recode command to change

the levels of a variable. III. Problem 5.3: Recode and Compute Pleasure Scale Score

A. A scale score can be computed by taking the average of several variables. B. Follow the directions in the book to compute a scale score from several

items. IV. Problem 5.4: Compute Parents Revised Education with the Means Command

A. Follow the directions in the book to compute a new variable. V. Problem 5.5: Check for Errors and Normality for the New Variables

A. Follow the directions in the book to utilize the Descriptives command to check the new variables.

VI. Saving the Updated HSB Data File A. Follow the directions in the book to save the recodes.

Chapter5/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 5 – Data File Management Using the college student data, solve the following problems: 5.1. Compute a new variable labeled average overall evaluation (aveEval) by

computing the average score (evalinst + evalprog + evalphys + evalsocl)/4.

• Select Transform =>Compute type aveEval • Type or click the formula shown above into the Numeric Expressions box. =>

OK.

You should check the new variable to make sure you typed the formula correctly. You can visually compute a few by examining these four variables in the Data View, and/or running Descriptives on the new variable, aveEval, to check if the results seem reasonable. Valid N (listwise) = 49; Minimum = 1.75; Maximum = 4.25; Mean = 3.1224. 5.3 Count the number of types of TV shows that each student watches.

• Select Transform => Count • Move tv sitcom, tvmovies, tvsports, and tvnews into the Numeric Variables box. • Name the Target Variable TVShows and label it Number of types of TV shows

watched. • Click Define Values => type “1” => Add => Continue => OK.

Each of the four types of TV shows are coded 1 for yes , watch them, or 0 for nom don’t watch, so the above commands count the number of different types of shows watched, from 0 (none of them) to 4 (all four). The COUNT can be evaluated visually by inspecting the Data View and/or by funning rfrequencies on the new variable. The mean number of types of TV shows watched is 1.98 and the mode is 2.00.14 students watch 1 type of TV show; 23 students watch 2 types of TV shows; 13 students watch 3 types of TV shows.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Fig. E. 11 Ch. 5 Output 5.1 COMPUTE aveEval= (evalinst+evalprog+evalphys+evalsocl)/4 . EXECUTE .\

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Fig. E.12

Fig. E. 13 Ch. 5 Output 5.3 COUNT TVShows = tvsitcom tvmovies tvsports tvnews (1). VARIABLE LABELS TVShows ‘Number of types of TV shows watched ‘. EXECUTE.

Appendix A.doc

Appendix A

Data Collection

This Appendix and Appendix B are designed to provide introductory background information on developing a questionnaire, collecting data with it, and getting the data ready to enter and analyze. This chapter also explains the importance of understanding your data, how to develop a questionnaire, and how to set up data coding sheets.

There is a class assignment in this Appendix that will provide an opportunity for your class to develop a short questionnaire, collect data using it, and get the data ready for entry into SPSS. If the class develops a set of questions, each person in the class should answer them. There will be five standard questions and then an opportunity for the class to add additional questions. However, if you are using this text as an individual learning tool or your professor wants to skip the topic of questionnaire development, there is a set of data on the CD ROM (and Appendix B) for you to enter into the SPSS data editor (and on the disk).

Knowing Your Data

It is important for you to have a clear understanding of the data in your data set. For example, in the class exercise one of the variables will be height. You could look around the room and intuitively know the data for that variable. You could line your class up by height to see the distribution and you could make fairly close approximations of statistics such as the average (mean), middle person (median), and tallest minus shortest (range). A few additional very tall people would skew the data positively. Conversely, several extra short people would skew the data negatively.

Students do not always have a clear understanding of the data they analyze. In your research you may receive a data set from a colleague, agency, or faculty member, where you do not really know the underlying meaning of the variables. For example, in chapter 6 you will receive a set of scores from the High School and Beyond (HSB) data, which includes a variable labeled visualization score. You will be told that it is a test of a person's ability to determine how a three-dimensional object would look if its spatial position were changed. But do you really know what that means? Hopefully, when abstract variables (such as visualization score, self-concept, or intrinsic motivation) are used in a research report, how they were measured (operationally defined) will be described completely enough for you to grasp the meaning.

In this lab assignment (Appendix C), we will use data generated by your class (or the similar data provided) so you will have a sound understanding of the meaning of each variable. This kind of understanding should help you learn SPSS and the statistical tests that we will demonstrate better than data sets which may not have as much meaning to you. You will also gain experience entering data and labeling variables.

Developing a Questionnaire

This appendix uses a questionnaire as a data collection device due to its relative simplicity. However, even a questionnaire poses issues for a researcher. Appendix B presents several topics related to questionnaire development and provides some guidance about writing good items. Appendix B also has a sample of the resulting questionnaire and a mock data set which you can use if your class does not do the exercise suggested in this appendix. If you will be developing a questionnaire in class and are not already knowledgeable about doing that, we suggest that you read Appendix B now.

Survey Questions Common to All Classes

In order to have a few variables to illustrate in the next chapter, we suggest that your class use the following five common variables: student’s height, same sex parent’s height, gender, marital status, and age. The questions and suggested response choices are as follows:

· What is your height in inches? ________

· What is the estimated height of your same sex parent? ________

· Gender (circle one number)

1. Male

2. Female

· What is your marital status? (circle one number)

1. Single, never married

2. Married

3. Divorced, separated, or widowed?

· What age group are you in? (circle one number)

1. Less than 22

2. 22-29

3. 30 or more

Class Exercise

As a class, develop a short questionnaire that uses the five questions above and asks other questions about your peers that are of interest to you. With guidance from your instructor, add approximately five questions to the five questions specified above. Try to include a variety of questions. Some examples of other questions the authors’ classes have used include:

How many hours a week do you watch TV? __________

What type of television shows do you watch? (Check all that apply)

Sitcoms

Movies

Sports

News

I feel that the current program I am enrolled in is meeting my needs. (circle one)

1. Strongly disagree

2. Disagree

3. Neutral

4. Agree

5. Strongly agree

How many hours a week do you study? __________

Do you have children? ________Yes ________No

See Appendix B for a sample questionnaire.

Students often like to ask questions on surveys that require respondents to answer with a phrase or a word. For example: What is your favorite car? Or what state were you born in? These questions are usually answered by spelling out your response, such as Chevrolet or Ford. This type of response is called alphanumeric. Although written answers can be very useful in survey research, for this exercise, we recommend that you do not use questions that require alphanumeric responses because of the difficulty of coding and classifying these types of responses.

Once your class develops a questionnaire of approximately 10 questions (the five above plus the five your class generates), either a student or the instructor should make the questions available to the class. The instructor can write the questions on a flip chart, the board, or duplicate them. Each student should have an answer sheet numbered 1 through 10, or have some other method to record their responses.

After the class completes the questionnaire, there will be an answer sheet for each student. The instructor may provide each student with a set of raw data about the class. That is, each student may receive data for every person in the class. If you have a class of 20, you would begin this phase of the assignment with 20 sets of answers for approximately 10 questions.

Next, you will prepare to enter the data on these answer sheets into SPSS. This can be done in several ways.

Number each of your questionnaires. This will prevent confusion if you drop or mix up the stack of questionnaires. SPSS automatically labels cases from 1 through the last case or subject. Therefore, it is a good idea for you to number the questionnaires beginning with one, rather than using letters or an alternative identification system. This process will also keep the respondents' answers anonymous and thus protect against violating the privacy of the human subjects in your research.

It is common for researchers to transfer the data from the questionnaires to a coding sheet by hand before entering the data into Excel, Word, or SPSS. This is helpful if there is not a separately numbered answer sheet, if the responses are to be entered in a different order than on the questionnaire, or additional coding or recoding is required before data entry. In these cases, you could make mistakes entering the data directly from the questionnaires. On the other hand, you could make copying mistakes or take more time transferring the data from the questionnaires or answer sheets to the coding sheet. Thus, there are advantages and disadvantages of using a coding sheet as an intermediate step between the questionnaire and the SPSS data editor. The data set for your class will be small and straightforward enough that you will be able to keep track of the data fairly easily going directly from your questionnaire into the SPSS editor as explained in Appendix C.

Appendix C gives you a step by step approach of how to do this for the first five variables. You then can use that knowledge to add the additional five variables developed by your class. If you did not collect data as a class project, use the data set provided in Appendix C.

Interpretation Questions

1. In the first SPSS run of the class data, you noticed that the average student height was 78 inches. What does this tell you about your coding of the data?

2. Describe the advantages and disadvantages of developing and using a coding sheet.

3. Why should you number your questionnaires?

PAGE

Appendix B.doc

APPENDIX B

Developing a Questionnaire and a Data Set

Questionnaires are commonly used in research to collect data that provide information about knowledge, perceptions, and attitudes. The information collected reflects the subject’s attitudes, knowledge, and so on at the time that the questionnaire is completed, but may include recollections of the past or predictions of future behavior. This appendix provides a brief overview about developing a questionnaire. A more complete discussion is provided by Salant and Dillman (1994) and Dillman and Smyth (2007).

Deciding What Information You Want

Before writing any questions for a questionnaire, it is important to think through how you will analyze the data. For example, you might want to know what cars are the most popular among your peers. An open-ended question to answer this would be: “What is your favorite vehicle?” Your classmates may simply write the name of their favorite vehicle. The following might be responses to this question:

Volkswagen

Ford

Truck

Corvette

Yacht

Ski lift

Convertible

The use of the word "vehicle" allows responses like “ski lift” and “yacht”. Thus, the way in which a researcher words a question can greatly impact the type of answers respondents provide.

Wording questions in a vague manner can cause other problems. As a researcher you have to decide how to input the data into SPSS. The vagueness of this “vehicle” example leaves the researcher with many different responses, some of which are not appropriate or the type of response that the question was intended to generate. It can be helpful to have peers read your questionnaire before proceeding. Many unclear questions or problems with wording can be found and changed before you end up with meaningless data.

Methods of Administration

After creating your questionnaire, the next step is to decide how to give or administer it to the subjects. Questionnaires can be administered using several different methods. The researcher could collect information via the telephone, mail, e-mail, individual face-to-

face, or group administration. The face-to-face method usually is used when obtaining in-depth information requires more time or explanation than is possible using mail or telephone. The group method can be used in situations such as a club meeting or a college classroom where a researcher or a surrogate administers and collects the questionnaires from the group.

If your class is being taught via distance education the class questionnaire data will most likely be collected via e-mail or mail. If your class is being offered in the traditional classroom format, then the data will most likely be collected via the group method.

Sampling Techniques

There are a variety of sampling techniques that can be used. There are two broad kinds of sampling: probability, and non-probability. This appendix does not allow for a full explanation of these methods. You can find more information about sampling in a research methods book such as Gliner and Morgan (2000).

A probability sample is defined as a selection of participants where every person in a population has a known, nonzero chance of being chosen. There are four probability sampling techniques: simple random, systematic with a random start, cluster, and stratified. These techniques are the most powerful in that the data collected from samples using these methods are more likely to be generalizable to larger populations.

A nonprobability sample is defined as a sampling technique where there is not an equal chance for a participant to be chosen. Thus, bias is usually introduced into the study. There are several types of non-probability sampling techniques: quota, convenience, and purposive are three. Unfortunately, most student research, including the class questionnaire suggested in chapter 4, use nonprobability sampling. With nonprobability sampling, making generalizations beyond the group of study may be problematic.

Writing Good Questions

In this section, we will describe briefly some of the main types of questionnaire (or survey) items and questions.

Likert scales. One of the most common types of items used to collect quantitative data is the Likert scale, which is sometimes referred to as an “ordered response scale”. Most likely you have used Likert scales many times. An example of a Likert scale, which might have been used for the class questionnaire is:

· Research Methods is my favorite subject. Circle the response which most represents your feelings:

Strongly Agree Agree Undecided Disagree Strongly Disagree

When using a Likert scale the researcher has some choices. The first choice is whether the Likert scale should include a middle choice like undecided or should the question be written in a manner that forces the respondent to either agree or disagree. Another decision that needs to be made is whether to provide a numerical scale under the wording. For example, the question as stated above could be used with a numerical scale added.

Strongly Agree Agree Undecided Disagree Strongly Disagree

5 4 3 2 1

Researchers disagree about whether this type of numerical addition is desirable. Sometimes researchers add the numerical scale when they input the data but do not show this numbering on the questionnaire. In standardized questionnaires it is common to have a number of Likert type items and often the words are at the top of the column and only the numbers are opposite each item.

Semantic Differential scales. Another way to collect quantitative data is with a semantic differential scale, which uses bipolar adjectives (adjectives that are opposite of one another). These adjectives can be Activity pairs, Evaluative pairs or Potency pairs. An example of an Activity pair would be “active – passive,” or “fast – slow.” Evaluative pairs include words such as “good – bad,” or “dirty – clean.” Examples of Potency pairs include comparisons such as “large – small” or “hot – cold.” The evaluative type is used most often in quantitative research.

An example of part of a semantic differential scale, which might be used for your class questionnaire is:

· The research course I am currently taking is: (circle the number that best reflects your view)

Bad 1 2 3 4 5 6 7 Good

Large 1 2 3 4 5 6 7 Small

Worthless 1 2 3 4 5 6 7 Valuable

When creating a semantic differential scale, it is best to switch some of the positive choices from the right side to the left side. Notice in the above example, the right side includes positive words, such as “good” and “valuable.” The left side includes words that are thought of as being more negative, such as “bad” and “worthless.” The middle pair is switched, with the more negative word, “small,” on the right and the more positive word, “large,” on the left. Switching the words will help ensure that the subject reads the words, and does not just circle “7” for each answer.

Checklists. Another type of data collection for surveys is a checklist. Checklists are words that are listed so the subject can mark the ones that apply. Usually subjects are asked to mark all that apply, thus there can be multiple words selected. An example of a checklist is:

· What type of television shows do you watch? Please check all that apply.

· Sitcoms

· Movies

· Sports

· News

Rankings. Rankings are another type of survey question. With ranking questions subjects are asked to rank or place in order a number of choices. Ranked items are relatively easy to make and for participants to answer as long as they are asked to rank only a few (for example three or four) items. However, ranking items are not so easy to handle statistically. Two problems that may occur are 1) the respondents may not rank all the items and 2) they produce ordinal data which eliminates the use of parametric statistics (such as t test and correlation). An example of a ranking scale is:

· Rank the following types of television shows in terms of how much you would like to watch them (1 = most preferred, 4 = least preferred). Please use each number only once and use all four numbers.

Sitcoms ______

Movies ______

Sports ______

News ______

Open-ended. The last type of survey question discussed in this appendix is the open-ended question. Unlike the other types of survey questions discussed earlier, open-ended questions do not provide choices for the subject to select. Each question is worded so that the subject must generate an answer. An example of an open-ended question would be:

· Do you have additional comments?

“How many hours a week do you watch television?” and “What is your height in inches,” are also technically open–ended questions, but they require only a single number for an answer. There are also partially open-ended questions, which list several possible response choices to pick from but also include an open-ended choice such as:

· Other, please specify___________________________________

Open-ended questions can be difficult to code. Also, respondents may find open-ended questions to be more difficult than other types of questions, because they require more thinking, so they may skip them. There are also advantages to using open-ended questions. On the positive side, open-ended questions give subjects the opportunity to write whatever they want, giving them more freedom to answer how they really feel about a topic. Also, open-ended questions can give the researcher more in-depth insight into how the subjects actually feel in a more in-depth manner.

Sample Questionnaire, Codebook, and Data

The following figure and two tables include data to be used if you did not develop a questionnaire and collect data in your class as recommended in chapter 4. Figure B.1 is a sample of how such a printed questionnaire might look. Table B.1 is the codebook and Table B.2 is the raw data.

About You and Your Family

What is your height in inches?______

What is the estimated height of your same sex parent? ______

What is your gender? (circle one number)

1. Male

2. Female

What is your marital status? (circle one number)

1. Single, never married

2. Married

3. Divorced, separated, or widowed

What age group are you in? (circle one number)

1. Less than 22

2. 22-29

3. 30 or more

Do you have children?

1. Yes

2. No

How many hours a week do you watch television? ______

What type of television shows do you watch? Please check all that apply.

· Sitcoms

· Movies

· Sports

· News

How many hours a week do you study? ______

What is your current grade point average? ______

Please rate the following four statements according to your evaluation (circle one number).

I feel that the institution I am attending is great.

1. Strongly disagree

2. Disagree

3. Neutral

4. Agree

5. Strongly agree

I feel that the current program in which I am enrolled is meeting my needs.

1. Strongly disagree

2. Disagree

3. Neutral

4. Agree

5. Strongly agree

I feel the institution I am attending has good physical facilities.

1. Strongly disagree

2. Disagree

3. Neutral

4. Agree

5. Strongly agree

I feel that the institution I am attending provides mostly horrible social activities.

1. Strongly disagree

2. Disagree

3. Neutral

4. Agree

5. Strongly agree

How many hours a week do you work? ______

Fig. B.1. Sample questionnaire.

Table B.1. Codebook

List of variables on the working file

Name Position

HEIGHT student height in inches 1

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8.2

Write Format: F8.2

PHEIGHT same sex parent's height 2

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8.2

Write Format: F8.2

GENDER gender of student 3

Measurement Level: Nominal

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

0 males

1 females

MARITAL marital status 4

Measurement Level: Nominal

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

1 single

2 married

3 divorced

AGE age group 5

Measurement Level: Ordinal

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

1 less than 22

2 22-29

3 30 or more

CHILDREN does subject have children 6

Measurement Level: Nominal

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

0 no

1 yes

HRSTV amount of tv watched per week 7

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

TVSITCOM television shows-sitcoms 8

Measurement Level: Nominal

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

0 no

1 yes

TVMOVIES television shows-movies 9

Measurement Level: Nominal

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

0 no

1 yes

TVSPORTS television shows-sports 10

Measurement Level: Nominal

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

0 no

1 yes

TVNEWS television shows-news shows 11

Measurement Level: Nominal

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

0 no

1 yes

HRSSTUDY hours of study per week 12

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

CURRGPA student's current gpa 13

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8.1

Write Format: F8.1

EVALINST evaluation of current institution 14

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

1 strongly disagree

2 disagree

3 neutral

4 agree

5 strongly agree

EVALPROG evaluation of major program of study 15

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

1 strongly disagree

2 disagree

3 neutral

4 agree

5 strongly agree

EVALPHYS evaluation of physical facilities of institution 16

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

1 strongly disagree

2 disagree

3 neutral

4 agree

5 strongly agree

EVALSOCL negative evaluation of social life (This variable has been

reversed, see Appendix G. It is now positive.) 17

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Value Label

1 strongly disagree

2 disagree

3 neutral

4 agree

5 strongly agree

HRSWORK hours per week spent working 18

Measurement Level: Scale

Column Width: 8 Alignment: Right

Print Format: F8

Write Format: F8

Table B.2. Appendix B Data

	height	pheight	gender	marital	age	children	hrstv	tvsitcom
1	67	66	1	.	1	0	18	1
2	72	72	0	2	3	1	4	0
3	61	62	1	3	3	1	25	1
4	71	71	0	1	2	0	6	0
5	65	63	1	1	1	0	8	1
6	67	68	0	1	2	0	10	0
7	69	74	0	1	2	0	24	1
8	75	73	0	2	3	1	14	0
9	62	65	1	3	3	0	5	1
10	61	64	1	2	3	1	5	1
11	64	63	1	2	2	1	10	1
12	64	64	0	3	2	1	6	0
13	70	65	0	1	2	0	20	0
14	63	62	1	2	3	0	7	1
15	64	60	1	1	2	0	4	1
16	63	60	1	2	3	1	10	1
17	65	58	1	2	3	1	10	1
18	71	76	0	2	3	1	6	1
19	72	75	0	2	3	1	9	0
20	68	65	1	1	2	0	8	1
21	75	73	0	1	1	0	20	0
22	67	66	1	3	1	1	4	1
23	69	74	0	1	1	0	10	1
24	67	68	0	1	1	0	8	0
25	64	63	1	1	1	0	14	1
26	64	62	1	3	2	1	8	1
27	70	65	0	3	2	1	17	0
28	64	64	0	1	1	0	8	0
29	70	75	0	1	1	0	14	0
30	72	75	0	3	1	1	12	0
31	64	62	1	2	2	1	15	1
32	71	73	0	3	3	1	10	0
33	67	66	1	2	3	1	8	1
34	63	60	1	2	3	1	6	0
35	69	68	0	1	1	0	20	1
36	68	66	0	3	2	1	15	1
37	64	63	1	1	1	0	15	1
38	70	68	0	1	1	0	7	1
39	71	74	0	2	3	1	6	0
40	72	73	0	1	1	0	20	1
41	60	62	1	3	2	1	10	1
42	65	63	1	2	1	1	15	1
43	72	72	0	1	1	0	24	1
44	63	62	1	1	2	0	18	1
45	75	74	0	2	2	1	10	1
46	71	68	0	1	1	0	24	1
47	65	59	1	2	2	1	5	0
48	69	67	0	2	2	1	15	1
49	63	62	1	2	2	1	10	0
50	67	66	1	3	3	0	22	1
tvmovies	tvsports	tvnews	hrsstudy	currgpa	evalinst	evalprog	evalphys	evalsocl	hrswork
1	0	0	5	3.2	3	3	4	2	0
0	1	0	24	2.6	3	.	3	2	43
1	0	0	20	3	4	4	4	4	.
0	1	1	17	3.1	3	4	3	3	37
1	0	1	18	3	3	3	4	4	35
0	1	0	10	3	3	2	3	4	38
0	1	1	12	3	4	4	4	5	0
0	1	0	10	3.6	3	3	3	5	20
1	0	1	8	3.4	3	4	3	4	40
1	0	1	17	3.5	2	2	2	3	38
1	0	1	38	3	4	3	3	2	28
0	1	0	35	2.4	2	3	3	4	40
0	1	0	20	3	4	4	4	5	0
1	0	1	10	3.8	3	4	3	2	30
1	0	0	12	3.3	3	4	5	3	45
1	0	0	35	3.7	3	3	3	2	43
0	0	1	30	3.4	3	3	3	4	34
0	1	0	22	3.7	4	4	4	4	36
0	1	0	20	3.6	4	3	4	5	30
1	0	0	12	3	3	3	3	4	29
0	1	1	2	2.5	3	4	2	4	0
1	0	0	30	3	2	2	2	3	21
0	1	1	12	3.4	2	2	2	4	23
0	1	1	10	2.5	4	3	3	3	26
1	0	0	15	3	2	2	2	3	12
1	0	0	30	3.1	2	2	3	3	15
0	1	1	18	2.4	4	3	2	3	5
0	1	1	12	2.9	2	1	3	4	35
0	1	0	4	3	3	3	2	3	7
0	1	1	8	2.9	2	2	5	2	10
1	0	0	15	3.2	3	3	3	3	40
0	1	1	20	3.8	2	3	5	3	42
0	0	0	12	3.8	1	1	5	1	40
0	0	1	22	4	1	2	3	3	35
0	0	0	7	2.8	2	2	1	2	0
0	1	1	12	2.9	3	3	2	2	28
1	0	0	18	3.2	2	2	4	2	0
0	1	0	13	3	2	3	3	4	40
0	0	1	10	3.6	1	2	3	1	50
0	1	0	8	2.8	4	4	1	4	40
1	1	0	20	3.2	2	2	5	1	20
0	0	0	10	3.4	2	3	3	1	15
0	1	1	8	2.9	1	1	3	2	38
1	0	0	15	3.2	2	2	5	1	10
0	1	0	10	3	2	2	3	2	20
0	1	1	5	2.9	4	5	2	3	37
0	0	1	12	3.6	1	2	5	1	40
0	1	1	18	3.3	2	2	3	2	35
0	0	1	10	3.1	3	2	5	3	10
1	1	0	20	3.9	1	1	4	1	20

133

Appendix C.doc

212 APPENDIX C

MAKING TABLES AND FIGURES 211

APPENDIX C

Making Tables and Figures

Don Quick

Colorado State University

Tables and figures are used in most fields of study to provide a visual presentation of important information to the reader. They are used to organize the statistical results of a study, to list important tabulated information, and to allow the reader a visual method of comparing related items. Tables offer a way to detail information that would be difficult to describe in the text.

A figure is a graphic or pictorial representation, such as a chart, graph, photograph, or line drawing. These figures may include pie charts, line charts, bar charts, organizational charts, flow charts, diagrams, blueprints, or maps. Limit figures to situations in which a visual helps the reader understand the methodology or results. Use a table to provide specific numbers and summary text, and use figures for visual presentations.

The meaning and major focus of the table or figure should be evident to the readers without their having to make a thorough study of it. A glance should be all it takes for the idea of what the table or figure represents to be conveyed to the reader. By reading only the text itself, the reader may have difficulty understanding the data; by constructing tables and figures that are well presented, readers will be able to understand the study results more easily.

The purpose of this appendix is to provide guidelines that will enhance the presentation of research findings and other information by using tables and figures. It will highlight the important aspects of constructing tables and figures using the Publication Manual of the American Psychological Association, Sixth Edition (2010) as the guide for formatting.

General Considerations Concerning Tables

Be selective as to how many tables are included in the total document. Determine how much data the reader needs to comprehend the material, and then decide if the information would be better presented in the text or as a table. A table containing only a few numbers is unnecessary, whereas a table containing too much information may not be understandable. Tables should be easy to read and interpret. If at all possible, combine tables that repeat data, so that results are presented only once.

Keep a consistency to all of your tables throughout your document. All tables and figures in your document should use a similar format, with the results organized in a comparable fashion. Use the same name and scale in all tables, figures, and the text that use the same variable.

In a final manuscript such as a thesis or dissertation, adjust the column headings or spacing between columns so the width of the table fits appropriately between the margins. Fit all of one table on one page. Reduce the data, change the type size, or decrease line spacing to make it fit. A short table may be on a page with text as long as it follows the first mention of it. Each long table is on a separate page immediately after it is mentioned in the text. If the fit and appearance would be improved, turn the table sideways (landscape orientation, with the top of table toward the spine) on the page.

Each table and figure must be discussed in the text. An informative table will supplement but will not duplicate the text. In the text, discuss only the most important parts of the table. Make sure the table can be understood by itself without the accompanying text; however, it is never independent of the text. There must be a reference in the text to the table.

Construction of the Table

Table C.1 is an example of an APA table for displaying simple descriptive data collected in a study. It also appears in correct relation to the text of the document; that is, it is inserted below the place that the table is first mentioned either on the same page, if it will fit, or the next page. (Fig. C.1 shows the same table with the table parts identified.) The major parts of a table are the number, the title, the headings, the body, and the notes.

Table C.1. An Example of a Table in APA Format for Displaying Simple Descriptive Data

Table 1

Means and Standard Deviations on the Measure of Self-Direction in Learning as a Function of Age in Adult Students

Self-directed learning inventory score

Age group

20–34

35–40

50–64

65–79

80+

--a

3.5

6.3

5.6

7.1

Note. The maximum score is 100.

a No participants were found for the over 80 group.

Table Numbering

Arabic numerals are used to number tables in the order in which they appear in the text. Do NOT write in the text “the table on page 17” or “the table above or below.” The correct method would be to refer to the table number like this: (see Table 1) or “Table 1 shows…” Left-justify the table number (see Table C.1). In an article, each table should be numbered sequentially in the order of appearance. Do not use suffix letters or numbers with the table numbers in articles. However, in a book, tables may be numbered within chapters; for example, Table 7.1. If the table appears in an appendix, identify it with the letter of the appendix capitalized, followed by the table number; for instance, Table C.3 is the third table in Appendix C.

Table Titles

Include the variables, the groups on whom the data were collected, the subgroups, and the nature of the statistic reported. The table title and headings should concisely describe what is contained in the table. Abbreviations that appear in the body of the table can sometimes be explained in the title; however, it may be more appropriate to use a general note (see also comments on Table Headings ). The title must be italicized. Standard APA format for journal submission requires double spacing throughout. However, tables in student papers may be partially single spaced for better presentation.

Table 1

Means and Standard Deviations on the Measure of Self-Direction in Learning as a

Function of Age in Adult Students

Inventory score

Age group

20-34

35-40

50-64

65-79

80+

--a

3.5

6.3

5.6

7.1

Note. The maximum score is 100.

a No participants were found for the over 80 group.

Fig. C.1. The major parts of an APA table.

Table Headings

Headings are used to explain the organization of the table. You may use abbreviations in the headings; however, include a note as to their meaning if you use mnemonics, variable names, and scale acronyms. Standard abbreviations and symbols for nontechnical terms can be used without explanation (e.g., no. for number or % for percent). Have precise title, column headings, and row labels that are accurate and brief. Each column must have a heading, including the stub column , or leftmost column. Its heading is referred to as the stubhead . The stub column usually lists the significant independent variables or the levels of the variable, as in Table C.1.

The column heads cover one column, and the column spanners cover two or more columns—each with its own column head (see Table C.1 and Fig. C.1). Headings stacked in this manner are called decked heads . This is a good way to eliminate repetition in column headings but try to avoid using more than two levels of decked heads. Column heads, column spanners, and stubheads should all be singular, unless referring to a group (e.g., children). Table spanners, which cover the entire table, may be plural. Use sentence capitalization in all headings.

Notice that there are no vertical lines in an APA style table. The horizontal lines can be added by using a “draw” feature or a “borders” feature for tables in the computer word processor, or they could be drawn in by hand if typed. If translating from an SPSS table or box, the vertical lines must be removed.

The Body of the Table

The body contains the actual data being displayed. Round numbers improve the readability and clarity more than precise numbers with several decimal places. A good guideline is to report two digits more than the raw data. A reader can compare numbers down a column more easily than across a row. Column and row averages can provide a visual focus that allows the reader to inspect the data easily without cluttering the table. If a cell cannot be filled because the information is not applicable, then leave it blank. If it cannot be filled because the information could not be obtained, or was not reported, then insert a dash and explain the dash with a note to the table.

Notes to a Table

Notes are often used with tables. There are three different forms of notes used with tables: (a) to eliminate repetition in the body of the table, (b) to elaborate on the information contained in a particular cell, or (c) to indicate statistical significance:

 A general note provides information relating to the table as a whole, including explanations of abbreviations used:

 Note. This could be used to indicate if the table came from another source.

 A specific note makes a reference to a specific row, column, or cell of the table and is given a superscript lowercase letter, beginning with the letter “a”:

 an = 50. Specific notes are identified in the body with a superscript.

 A probability note is to be included when one or more inferential statistics have been computed and there isn’t a column showing the probability, p. Asterisk(s) indicate the statistical significance of findings presented within the table. Try to be consistent across all tables in a paper. The important thing is to use the fewest asterisks for the largest p value. It is common to use one asterisk for .05 and two for .01. For example:

 *p < .05. **p < .01.

Notes should be listed with general notes first, then specific notes, and concluded with probability notes, without indentation. They may be single spaced for better presentation. Explain all uses of dashes and parentheses. Abbreviations for technical terms, group names, and those of a similar nature must be explained in a note to the table.

Constructing a Table in Microsoft Word 2007

For this step-by-step example, results from an ANOVA analysis were chosen from previous examples in the book. See Fig. C.2. The data are transferred from the standard SPSS output to an APA table.

image1.wmf

ANOVA

grades in h.s.

18.143

9.071

4.091

.021

155.227

2.218

173.370

Between Groups

Within Groups

Total

Sum of

Squares

Mean Square

Sig.

Fig. C.2. An example of the type of default table generated from an SPSS ANOVA output.

The finished table should look like Table C.2. This explanation is accomplished using MS Word 2007. In earlier versions the functionality will generally be the same but with 2007 Microsoft greatly changed the look and feel so the screens will differ (how you find the right command). Also, you will need to adjust the number of columns and rows for the type and amount of data that you need to present.

Table C.2. An Example of an ANOVA Table in APA Format

Table 2

One-Way Analysis of Variance of Grades in High School by Father's Education

Source	df	SS	MS	F	p
Between groups	2	18.14	9.07	4.09	.02
Within groups	70	155.23	2.22
Total	72	173.37

Step One: Insert the Table

The Headings and Body of the table are actually built using Word’s table function. Type your Table Number and Title. Then on the next line after the title, insert a 6 × 4 table:

· Insert → Table… (see Fig. C.3).

· For our example of the ANOVA, click on the 6 × 4 box. You will need to adjust the number of columns and rows for the type and amount of data that you need to present.

· Compare your table to Table C.3.

Table C.3. Step 1

Step Two: Correcting the Grid Lines

APA uses no vertical and just a few horizontal lines, so it is best to remove them all and then put back the ones that are needed. However, you need to first turn on your table gridlines if they aren’t on already:

· Home → In the Paragraph Box click the arrow next to the Border button and select View Gridlines. See Fig. C.4.

Then remove all the table border lines by:

· Right click anywhere on the table and select: Borders and Shading… to get Fig. C.5.

· Select the Borders tab, if it’s not already selected.

· Under Settings click the box to the left of None to remove them. Make sure under Apply to: it says Table.

· Click OK.

To add the correct lines for the APA table in our example:

· Left click to the left of the top row to select the top row.

· Right click anywhere on the top row and select: Borders and Shading… to get Fig. C.6.

· Make sure the solid line Style is selected and the Width is 1/2 pt.

· In the Preview picture click the Upper and Lower bar buttons. This inserts the top two lines in the table.

· Click OK.

· Select the last row in your table.

· Click the Lower bar button only. This inserts the bottom line in the table.

· Click OK.

Compare your table to Table C.4.

Table C.4. Step 2

Step 3: Adding the Text and Data

The text in the body of an APA table is an equal distance between the top and bottom of the cell:

· Select the table by pointing at the table and clicking the target that appears in the upper left corner .

· Click the Home tab and the Paragraph dialog box button (see Fig. C.7).

· Set Line spacing to Single (see note on Fig. C.7).

· Set Spacing Before and After to 6pt (see Fig. C.7).

· Click OK.

Enter the headings and data into each cell; the SPSS printout will have all of the information to accomplish this. Don’t worry about wrapping and aligning at this time. That is easier to do after the data are entered.

Compare your table to Table C.5.

Table C.5. Step 3

Source	df	SS	MS	F	p
Between groups	2	18.14	9.07	4.09	.02
Within groups	70	155.23	2.22
Total	72	173.37

Step 4: Adjusting the Column Widths

In an APA table, the Heads should be center aligned in the cell and the Stubs are left aligned. The numbers in the Cell are decimal aligned and centered under the Heads. Notice also that “Between groups” wrapped. Let’s first align the Heads and Stubs, then fix that wrap and finally align the data under the Heads.

To center align the Heads:

· Select the Header Row of your table by clicking to the left of the top row.

· Click the Center align button in the Formatting Toolbar (see Fig. C.8).

· The stub column should already be left aligned; if not, select the cells and click the Align Left button.

When MS Word creates a table it will generally make all of the columns the same width. To fix the wrap on the “Between groups” cell, that column will need to be widened slightly and then to keep the table within the margins the data columns will need to be decreased slightly. This may be a trial and error process to get the right spacing for your text.

· Right click anywhere on the Stubs column and select Table Properties… to get Fig. C.9.

· Click the Column Tab.

· Set the Preferred width to 1.4″.

· Click the Next Column button and set it to 1.0″.

· Repeat for all of the columns, setting them to 1.0″.

· Click OK.

Compare your table to Table C.6.

Table C.6. Step 4

Source	df	SS	MS	F	p
Between groups	2	18.14	9.07	4.09	.02
Within groups	70	155.23	2.22
Total	72	173.37

Step 5: Centering the Data Cells

To set the Cell columns so that they are all centered under their Heads, you will need to set the Tabs for each column of data cells to a Decimal Tab. We recommend this method of setting all columns the same and then adjusting them separately so they look right, because it requires less individual column adjustment:

· Select just the data cells by clicking in the upper left one, holding the shift key down, and then clicking in the lower right cell.

· Click the Paragraph dialog box button and then the Tabs button to get Fig. C.10.

· Clear all of the Tabs in the selected cells first by clicking the Clear All button.

· Click Alignment Decimal.

· Type .35″ in the Tab stop position box.

· Click the Set button.

· Click OK.

Compare your table to Table C.7.

Table C.7. Step 5

Source	df	SS	MS	F	p
Between groups	2	18.14	9.07	4.09	.02
Within groups	70	155.23	2.22
Total	72	173.37

Step 6: Touch Up the Finished Table

The df numbers looks like they could be adjusted slightly to the right and the p slightly to the left. We show you this so that you will know how to get a perfect decimal alignment of the data under the column head text. This may be trial and error depending on your data.

· Select the cells of the df column by clicking first on the top data cell, “2,” hold the Shift key down, and then click on the bottom data cell, “72.”

· Click the Paragraph dialog box button and then the Tabs button.

· Clear all of the Tabs in the selected cells first by clicking the Clear All button.

· Under Alignment, Click Decimal.

· Type .45″ in the Tab stop position box, to set decimal tab .45″ from the left edge of the cell.

· Click the Set button.

· Click OK.

· Repeat for the p column but set it to .25″ to set decimal tab .25″ from the left edge of the cell.

Compare your finished table to Table C.8.

Table C.8. Step 6

Table 2

One-Way Analysis of Variance of Grades in High School by Father's Education

Source	df	SS	MS	F	p
Between groups	2	18.14	9.07	4.09	.02
Within groups	70	155.23	2.22
Total	72	173.37

Adjusting the SPSS Output to Approximate the APA Format

The preceding example shows how the standard SPSS output can be used to create a table in APA format. However, this does require some knowledge of your word processing program's table creation capabilities in order to accomplish this task. It also requires retyping the data into the table. You can adjust SPSS so that the output will approximate the APA format. We would not recommend submitting this to an APA journal, but it may be acceptable for student papers and some graduate program committees.

Follow these commands before running the SPSS analyses of your data:

· Click Edit → Options.

· Under the Pivot Tables tab select Academic (see Fig. C.11).

· Press OK.

· Run the SPSS statistical analysis.

Your outputs will look similar to Table C.9, which approximates an APA table.

In order to transfer it to MS Word:

· On the SPSS output, right click on the table that you want to transfer.

· Select Copy from the short menu presented (see Fig. C.12).

· Place the curser in the MS Word file where you want to put the table.

· Select Paste in MS Word.

· Place the curser in the MS Word file where you want to put the table.

· Select Paste in MS Word.

You can then move it and format it like any other image in MS Word, but it cannot be edited.

Table C.9. An Example of the SPSS "Academic" Output

Table 2

One-Way Analysis of Variance of Grades in High School by Father's Education

image13.wmf

ANOVA

grades in h.s.

18.143

9.071

4.091

.021

155.227

2.218

173.370

Between Groups

Within Groups

Total

Sum of Squares

Mean Square

Sig.

Using Figures

Generally, the same concepts apply to figures as have been previously stated concerning tables: They should be easy to read and interpret, consistent throughout the document when presenting the same type of figure, kept on one page if possible, and supplement the accompanying text or table. Considering the numerous types of figures, I will not attempt to detail their construction in this document. However, one thing is consistent with all figures. In contrast to table titles, the figure number and caption description are located below the figure, but the caption provides enough detail so that the figure can be understood without reading the accompanying text. See Fig. C.13.

Some cautions in using figures:

1. Make it simple. Complex diagrams that require lengthy explanation should be avoided unless they are an integral part of the research.

2. Use a minimum number of figures for just your important points. If too many figures are used, your important points may be lost.

3. Integrate text and figure. Make sure your figure compliments and enhances the accompanying text.

Figure. 1. Scatterplot showing a curvilinear relationship between mathematics achievement and mosaic pattern test score.

Table Number

Title

Column Spanner

Headings

Cell

Stub Column

Use horizontal lines under the title, headings, and the body, but no vertical lines.

Body

Notes

The Title is in italics but the Table Number is not. Also note the space between the title and the top of the table.

Fig. C.3. Using MS Word

to make a table.

Fig. C.4. Turning the View Gridlines on.

Fig. C.5. Clearing the borders.

Upper and Lower bar buttons.

Note: the Apply to is set to Cell but because you have selected the entire row, it will look like a solid line across the table.

Fig. C.6. Setting the horizontal lines.

Note: If you can’t see the gridlines, turn them on to better see where the rows and cells are. They won’t show when printed. See Fig C.5 above.

Paragraph dialogue box button.

Note: With normal SPSS tables with numerical data and short heads, Single and the 6pt before and after sets the text equidistant in the cell. Wrapped text will come out single spaced, which is a better presentation for student papers and dissertations. However, if you have text that wraps and the journal requires that all text be double spaced, then you will need to Double space only those cells that wrap.

Fig. C.7. Setting line spacing within the cell.

Note: If the Align Buttons aren’t showing on the Paragraph Toolbar, you can select the proper alignment from the menu: Paragraph dialogue box.

Fig. C.8. Center aligning the heads.

Note: This can also be accomplished by dragging the vertical column separator lines until the “Between groups” is not wrapped and then dragging the other column separator lines so that they are within the margins. However, this produces uneven column spaces. We recommend the method outlined.

Fig. C.9. Adjusting the column widths.

Fig. C.10. Setting the decimal tabs.

Fig. C.11. Setting SPSS for an approximate APA format output.

Fig. C.12. Copying tables from SPSS.

Note: The figure number is italicized but the caption itself is not. Also, the caption text is sentence case (only the first word and proper nouns are capitalized and it ends in a period).

Fig. C.13. An example of a figure and caption in APA format.

199

Chapter7/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 7 Methods to Provide Evidence for Reliability and Validity

Study Guide

OBJECTIVES: You will be able to:

1. Understand the different methods for computing measurement/instrument reliability. 2. Explain the assumptions related to measures of reliability. 3. Compute different statistics to evaluate measurement reliability. 4. Understand the difference between exploratory factor analysis and principal components

analysis. 5. Understand the assumptions related to factor analysis. 6. Assess sets of latent constructs. 7. Mathematically derive a smaller number of variables from a larger set of variables. 8. Understand how to use Factor Analysis and Alpha together to make summated scales.

KEY TERMS TO DEFINE AFTER READING THIS CHAPTER: • Internal consistency reliability

o Cronbach’s Alpha

• Reliability

o Test-retest

o Equivalent forms reliability

o Interrater reliability

• Reliability with nominal data

o Cohen’s Kappa

• Exploratory factor analysis

• Principal components analysis

• Extraction

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

• Principal axis extraction

• Rotation

• Varimax (orthogonal)

• Kaiser-Meyer-Olkin (KMO)

• Bartlett’s Test of Sphericity

• Communalities

• Eigenvalues and variance explained

• Scree plot

Chapter7/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 7 Methods to Provide Evidence for Reliability and Validity

Outline

• Overview of Chapter

- Correlation Coefficients

• Measurement Reliability

- Interrater Reliability

- Internal Consistency

- Assumptions for Measuring Reliability

• Measurement Validity

- Overview

- Evidence Based on Internal Structures

• Problem 7.1 Cohen’s Kappa to Assess Reliability with Nominal Data

• Problem 7.2 Correlations and Paired t to Assess Interrater Reliability

o Interpretation of Output 7.2

o An Example of How to Write about Problem 7.2

• Problem 7.3 Exploratory Factor Analysis to Assess Evidence of Validity

- Conditions and assumptions for Factor Analysis

- Interpreting Output 7.3

- Example of How to Write about Problem 7.3

• Problem 7.4: Cronbach’s Alpha to Assess Internal Consistency Reliability

o Output 7.4a: Cronbach’s Alpha for Revised Competence Scale

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

o Scale: Alpha for the Revised Competence Scale

o Interpretation of Output 7.4a

o Example of How to Write About Problems 7.4a, 7.4b, and 7.4 c

• The Use of Factor Analysis and Alpha to Make Summated Scales

• Interpretation Questions

• Extra SPSS Problems

Chapter7/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 7 Several Measures of Reliability

Answers to Extra Problems

7.1 Write a research question that can be answered from the data using a paired sample t

test. Run the t test and provide a full interpretation. GET FILE='X:\Morgan Documents\SPSS-5th Edition Introductory\Data Sets\college student data.sav'. DATASET NAME DataSet1 WINDOW=FRONT. T-TEST PAIRS=height WITH pheight (PAIRED) /CRITERIA=CI(.9500) /MISSING=ANALYSIS.

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

Pair 1 student height in inches 67.3000 50 3.93959 .55714

same sex parent's height 66.7800 50 5.10418 .72184

Paired Samples Correlations

N Correlation Sig.

Pair 1 student height in inches &

same sex parent's height 50 .842 .000

With the paired t being .52 mean difference. With a t = 1.3, p=.142.

There is a significant and large correlation between the estimates of height and same sex

parent height r=.842, p<.001. Although there is a strong correlation between these data,

there is not a statistical difference between the estimated height of students and same sex

parent height with the paired t = 1.3, p-.142.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

7.2 Identify four variables that could be combined to make a summated scale. Check the

Cronbach’s alpha for that scale. Write an interpretation of the resulting output. RELIABILITY /VARIABLES=evalinst evalprog evalphys evalsocl /SCALE('Alpha for Univ Evaluation Scale') ALL /MODEL=ALPHA /STATISTICS=DESCRIPTIVE CORR /SUMMARY=TOTAL.

Reliability Statistics

Cronbach's

Alpha

Cronbach's

Alpha Based on

Standardized

Items

N of Items

.506 .533 4

Item Statistics

Mean Std. Deviation N

positive evaluation,

institution 3.39 .953 49

positive evaluation, major 3.27 .953 49

positive evaluation, facilities 2.76 1.071 49

positive eval, social life 3.08 1.187 49

Item-Total Statistics

Scale Mean if

Item Deleted

Scale Variance

if Item Deleted

Corrected Item-

Total Correlation

Squared

Multiple

Correlation

Cronbach's

Alpha if Item

Deleted

positive evaluation,

institution 9.10 3.719 .658 .662 .103

positive evaluation, major 9.22 3.928 .586 .608 .176

positive evaluation, facilities 9.73 7.032 -.199 .047 .812

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

positive eval, social life 9.41 3.788 .400 .353 .327

Inter-Item Correlation Matrix

positive

evaluation,

institution

positive

evaluation,

major

positive

evaluation,

facilities

positive eval,

social life

positive evaluation,

institution 1.000 .779 -.150 .579

positive evaluation, major .779 1.000 -.139 .478

positive evaluation, facilities -.150 -.139 1.000 -.213

positive eval, social life .579 .478 -.213 1.000

I chose these four items because they all have to do with evaluating life as a student. IT

is clear though that they do not fit together as a construct. The alpha is just above .5 and

would not be acceptable as a construct. However, note the negative correlation of

facilities with the other items in the correlation Inter –Item Correlation Matrix. Also note

the last column of the Item Total Statistics. Not that if the evaluation of facilities was

removed, the alpha would be higher.

I have run it here, without facilities. RELIABILITY /VARIABLES=evalinst evalprog evalsocl /SCALE('Alpha for Univ Evaluation Scale') ALL /MODEL=ALPHA /STATISTICS=DESCRIPTIVE CORR /SUMMARY=TOTAL.

Reliability Statistics

Cronbach's

Alpha

Cronbach's

Alpha Based on

Standardized

Items

N of Items

.812 .825 3

Sample write-up

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

The Cronbach’s alpha was run on the overall evaluation of students on four items Evaluation of the institution, program, social life, and facilities. Alpha was .50. Facilities was negatively correlated with the other three and was removed. The overall alpha of the remaining three was .812 and the items fold together into one construct.

Answers to Extra Problems

Chapter8/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner

Chapter 8 – Cross-Tabulation, Chi-Square, and Non Parametric Measures of Association Study Guide

OBJECTIVES: The student will be able to:

1. Create cross-tabulation tables from two variables which both have few levels. 2. State the assumptions and conditions for the use of Chi-square, Phi and Cramer’s V. 3. Determine if there is a statistically significant relationship between two nominal variables

using Chi-square. 4. Assess the strength of the relationship (effect size) between two nominal variables using

phi (or Cramer’s V). 5. Compute and interpret Kendall’s tau-b for ordinal and categorical variables. 6. Compute and interpret eta for the relationship between one nominal and one normal/scale

variable. 7. Compute Cohen’s Kappa to assess interobserver reliability for two nominal variables. 8. Write about the results of the statistical tests performed in this chapter.

TERMINOLOGY: • crosstabulation (crosstabs) • count • expected count • Chi-square • Pearson Chi-square test • Fisher’s exact test • Phi and Cramer’s V ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter8/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner

Chapter 8 – Cross-Tabulation, Chi-Square, and Nonparametric

Measures of Association Chapter Outline

I. Problem 8.1: Chi-Square and Phi (or Cramer’s V)

A. Requires two nominal or dichotomous variables. 1. Nominal variables have distinct, unordered variables and each

subject is in only one level 2. Less appropriate if either variable has three or more ordered

levels a. These statistics do not take into account order. b. Sacrifice power if you use these statistics with ordinal or

scale variables. B. Chi-Square

1. Requires relatively large sample size and relatively even split of subjects among the levels.

2. Expected counts in 80% of the cells should be greater than 5. 3. Tells if the relationship is statistically significant but does not tell

the effect size. C. Fisher’s Exact Test

1. Can be used for 2x2 crosstabs instead of chi-square for small samples.

2. Tells if the relationship is statistically significant but does not tell the effect size.

D. Phi or Cramer’s V 1. Provides a test of statistical significance. 2. Provides information about the strength of the association

between two categorical variables and can be used as a measure of effect size.

a. Measure is similar to r. b. Values near 0 indicate no relationship. c. Values near +/- 1.0 indicate strong relationships.

3. Phi is appropriate for 2x2 crosstabs. 4. Cramer’s V is appropriate for larger crosstabs.

E. Assumptions and Conditions for the Use of Chi-Square, Phi and Cramer’s V

1. The data for the variables must be independent. 2. Data are treated as nominal, even if ordered. 3. For chi-square, if the expected frequencies are less than 5, the

test of significance is too liberal. At least 80% of the expected frequencies should be 5 or larger. All should be at least 5 if you have a 2x2 chi-square.

F. Follow directions in book to compute and interpret chi-square, phi (or Cramer’s V).

II. Problem 8.2: Other Nonparametric Associational Statistics

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner

A. Follow directions in book to compute and interpret phi/Cramer’s V and Kendall’s tau-b

III. Problem 8.3: Cross-Tabulation and Computation of Eta A. Eta is used when one variable is nominal and the other is approximately

normal/scale. B. Follow directions in book to compute and interpret eta.

IV. Problem 8.4: Cohen’s Kappa for Reliability with Nominal Data A. Cohen’s Kappa is used when there are two nominal variables with the

same values 1. Usually two raters’ observations or scores using the same codes. 2. Allows two raters to be checked for reliability (agreement

between the measures). B. Follow the directions in the book to compute and interpret Cohen’s Kappa.

Chapter 8 – Cross-Tabulation, Chi-Square, and Nonparametric
Chapter Outline

Chapter8/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 8 – Cross-Tabulation, Chi-Square, and Non Parametric Measures

of Association Using college student data from file, do the following problems. Print your outputs after typing your interpretations on them. Please circle the key parts of the output that you discuss. 8.1. Run crosstabs and interpret the results of chi-square and phi (or Cramer’s V), as

discussed in Chapter 6 and in the interpretation of Output 8.1, for: (a) gender and marital status and (b) age group and marital status.

Selection of Statistic Problem 8.1 is designed to analyze two categorical variables (i.e., ones that are nominal

or have only a few ordered levels). Remember, nominal variables are variables that have

distinct unordered levels; each subject is in only one level (you can only be male or

female). Chi-square (χ2) or phi/Cramer’s V are good choices for statistics when

comparing two nominal variables. They are less appropriate if either variable has three

or more ordered levels because these statistics do not take in account the order and, thus,

sacrifice power.

Chi-square requires a relatively large sample size and/or a relatively even split of the

subjects among the levels because the expected counts in 80% of the cells should be

greater than 5. Fisher’s exact test for 2x2 (only) crosstabs can be reported instead of chi-

square for small samples. Chi-square and the Fisher’s exact test provide similar

information about differences between groups (and relationships among variables);

however, they only tell us whether the difference (or relationship) is statistically

significant (not likely to be due to chance). They do not tell the effect size (the strength

of the relationship).

Phi and Cramer’s V provide information about the strength of the association between

two categorical variables and can be used as a measure of the effect size similar to r. If

one has a 2x2 cross tabulation, phi is the appropriate statistic. For larger crosstabs,

Cramer’s V is used. The numbers in the crosstabs description refer to the number of

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

levels in each of the variables. Thus, for gender and marital, the crosstab is 2x3 because

gender has two levels and marital status has three levels.

Assumptions of Chi-square

• If the expected frequencies, are less than 5, the test of significance is too

liberal. At least 80% of the expected frequencies should be 5 or larger. All

should be at least 5 if you have a 2x2 chi-square.

• Data are treated as nominal, even if ordered.

Assumptions of Phi/Cramer’s V (measure of association for nominal variables)

How to Produce the Selected SPSS Output To answer Problem 8.1a with Windows:

• Click on Analyze ⇒ Desc Stat ⇒ Crosstabs. This will open the Crosstabs window • Highlight gender • Click on the arrow to move gender over to the Row(s) box • Highlight marital • Click on the arrow to move it over to the Column(s) box • Click on Statistics. This will open the Crosstabs: Statistics window • Click on Phi/Cramer’s V and chi-square • Click on Continue • Click on cells. This will open the Crosstabs: Cell Display • Click on Expected, Total and Standardized • Click on Continue and O.K.

To answer Problem 8.1a with syntax:

CROSSTABS /TABLES=gender BY marital /FORMAT= AVALUE TABLES /STATISTIC=CHISQ PHI /CELLS= COUNT EXPECTED TOTAL SRESID .

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

SPSS Output for Problem 8.1a

Crosstabs

Case Processing Summary

49 98.0% 1 2.0% 50 100.0% gender of student * marital status

N Percent N Percent N Percent Valid Missing Total

Cases

gender of student * marital status Crosstabulation

14 7 5 26 10.6 9.6 5.8 26.0

28.6% 14.3% 10.2% 53.1% 1.0 -.8 -.3

6 11 6 23 9.4 8.4 5.2 23.0

12.2% 22.4% 12.2% 46.9% -1.1 .9 .4

20 18 11 49 20.0 18.0 11.0 49.0

40.8% 36.7% 22.4% 100.0%

Count Expected Count % of Total Std. Residual Count Expected Count % of Total Std. Residual Count Expected Count % of Total

males

females

gender of student

Total

single married divorced marital status

Total

Chi-Square Tests

4.011a 2 .135 4.095 2 .129

2.392 1 .122

Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases

Value df Asymp. Sig.

(2-sided)

0 cells (.0%) have expected count less than 5. The minimum expected count is 5.16.

Significance level. This chi-square is not significant.

This line indicates the results of the chi square statistic.

Variable name. The levels are: single, married, and divorced.

Count is the number of subjects who responded in that cell.

Number of subjects expected by chance to be in that cell

The differences between count and expected count, then standardized. Residuals greater than 2 indicates that the difference is a major contributor to a significant chi-square. None of the residuals are greater than 2 in this output.

One observation is missing.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Symmetric Measures

.286 .135

.286 .135 49

Phi Cramer's V

Nominal by Nominal

N of Valid Cases

Value Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.

How to Produce the Selected SPSS Output

To answer Problem 8.1b with Windows:

• Click on Analyze ⇒ Desc Stat ⇒ Crosstabs. This will open the Crosstabs window • Highlight age group and move it into the Row(s) box • Highlight marital and move it into the Column(s) box • Click on Statistics. This will open the Crosstabs: Statistics window • Click on Phi, Cramer’s V and chi-square • Click on Continue • Click on Cells. This will open the Crosstabs: Cell Display window • Click on Expected, Total and Standardized • Click on Continue and O.K.

To answer Problem 8.1b with syntax:

CROSSTABS /TABLES=age BY marital /FORMAT= AVALUE TABLES /STATISTIC=CHISQ PHI /CELLS= COUNT EXPECTED TOTAL SRESID .

Use this line for a 2x2

Use this line if one or both variables has 3 or more levels. Note, with a 3x2 crosstab, phi and Cramer’s V are the same.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

SPSS Output for Problem 8.1b Crosstabs

Case Processing Summary

49 98.0% 1 2.0% 50 100.0%age group * marital status N Percent N Percent N Percent

Valid Missing Total Cases

age group * marital status Crosstabulation

13 1 2 16 6.5 5.9 3.6 16.0

26.5% 2.0% 4.1% 32.7% 2.5 -2.0 -.8

7 6 5 18 7.3 6.6 4.0 18.0

14.3% 12.2% 10.2% 36.7% -.1 -.2 .5

0 11 4 15 6.1 5.5 3.4 15.0

.0% 22.4% 8.2% 30.6% -2.5 2.3 .3

20 18 11 49 20.0 18.0 11.0 49.0

40.8% 36.7% 22.4% 100.0%

Count Expected Count % of Total Std. Residual Count Expected Count % of Total Std. Residual Count Expected Count % of Total Std. Residual Count Expected Count % of Total

less than 22

22-29

30 or more

age group

Total

single married divorced marital status

Total

Chi-Square Tests

23.173a 4 .000 28.888 4 .000

11.590 1 .001

Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases

Value df Asymp. Sig.

(2-sided)

3 cells (33.3%) have expected count less than 5. The minimum expected count is 3.37.

This is not good because for chi- square no more than 20% of the cells should have expected counts less than 5.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Symmetric Measures

.688 .000

.486 .000 49

Phi Cramer's V

Nominal by Nominal

N of Valid Cases

Value Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.

Description of Output 8.1a and b

The Case Processing Summary table for Output 8.1a indicates that there is one person

missing data about either gender or marital status, or both. There is also one person

missing in Output 8.1b.

The 8.1a cross tabulation table shows the number of individuals (counts) with each

marital status that are males or females. The expected count is the number one would

expect to have in each category by chance based on the column and row totals. Each cell

also shows a percent of the grand total, which indicates the percentages of persons of

each gender occupying each marital status.

Look at the difference between the observed count and the expected count. In tables

larger than 2x2, the differences are of different sizes. The chi-square will assist in

determining whether these differences could have occurred by chance. If the statistical

test is significant, then it is important to identify and discuss the effect size by examining

phi or Cramer’s V. It is also critical to determine which cells are contributing to the

significant results if we want to discuss the findings fully; to do this, we examine the

standardized residuals to see which one(s) are more than 2.0. Note that in Problem 8.1b,

age is an ordered variable (going from low to high) so chi-square and Cramer’s V are not

as powerful as would be an ordinal statistic (e.g., Kruskal-Wallis).

For 3x3 and larger cross tabulations, phi and V are different; use Cramer’s V.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

You can see from footnote a for the chi-square tests table in Output 8.1a, that the

assumption about cell sizes has not been violated because no cells have an expected count

of less than 5. On the other hand, three of the cells for the 8.1b chi-square have an

expected count of less than 5 so an assumption has been violated and we should be

cautious about the interpretation of chi-square. However, the chi-square Sig. is so much

less (.000) than .05 that we probably don’t need to worry about it in this case.

It’s usually not necessary to have a table to report one chi-square. Thus, we have

combined Output 8.1a and 8.1b into one table, Table 8.1.

Example of How to Write About Problem 8.1a and b

Results

Table 8.1 shows the Pearson chi-square results and indicates that males and

females are not significantly different on whether they are single, married, or divorced

(χ2 = 4.01, df = 2, N = 49, p = .135).

Table 8.1 also shows that marital status is different for each of the age groups (χ2

= 23.17, df = 4, N = 49, p < .001). Cramer’s V indicates that the strength of the

association between the two variables is .49 and, thus, the effect size is relatively large.

Students under 22 are more likely than expected under the null hypothesis to be single

and older student (over 30) are more likely to be married. However, 33% of the cells had

expected frequencies less than 5 so we need to be cautious about this conclusion.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Table 8.1

Chi-square Analyses of Prevalence of Gender and Age Among Single, Married, and Divorced Students Marital Status n Single Married Divorced χ2 p Variable Gender 4.01 .135 Males 26 14 7 5 Females 23 6 11 6 Age Group 23.17 <.001 Less than 22 16 13 1 2 22-29 18 7 6 5 30 or more 15 0 11 4 Totals 49 20 18 11 8.2. Select two other appropriate variables, run and interpret the output as we did in

Output 8.1. Selection of the Statistic (Same as for Problem 8.1.)

How to Produce the Selected SPSS Output (Same as 8.1, but the variables are different)

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

SPSS Output for Problem 8.2a

Crosstabs

Case Processing Summary

50 100.0% 0 .0% 50 100.0% gender of student * television shows-sports

N Percent N Percent N Percent Valid Missing Total

Cases

television shows-sports * gender of student Crosstabulation

2 22 24 12.5 11.5 24.0

4.0% 44.0% 48.0% -3.0 3.1

24 2 26 13.5 12.5 26.0

48.0% 4.0% 52.0% 2.9 -3.0 26 24 50

26.0 24.0 50.0 52.0% 48.0% 100.0%

Count Expected Count % of Total Std. Residual Count Expected Count % of Total Std. Residual Count Expected Count % of Total

yes

television shows-sports

Total

males females gender of student

Total

Chi-Square Tests

35.258b 1 .000 31.974 1 .000 41.365 1 .000

.000 .000

34.553 1 .000

Pearson Chi-Square Continuity Correctiona

Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases

Value df Asymp. Sig.

(2-sided) Exact Sig. (2-sided)

Exact Sig. (1-sided)

Computed only for a 2x2 tablea.

0 cells (.0%) have expected count less than 5. The minimum expected count is 11.52.

All cells have standardized residuals greater than 2 so all are contributing to the significant chi-square. Males say yes more and no less than females who say the reverse.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Symmetric Measures

-.840 .000 .840 .000

Phi Cramer's V

Nominal by Nominal

N of Valid Cases

Value Approx. Sig.

Not assuming the null hypothesis.a.

Using the asymptotic standard error assuming the null hypothesis.

Description of Output 8.2a (Same as for Problem 8.1.)

Example of How to Write About Problem 8.2a

Results

Chi-square analysis indicated a significant difference between males (24 out of

26) and females (2 out of 24) in the proportions who watch sports programs on television

(χ2= 35.26, df =1, N = 50, p < .001). As hypothesized, male students were more likely to

watch sports on TV than are females. The phi value (-.84) indicates that the effect size

was very large (Cohen, 1988).

Discussion

The finding that males tend to watch sport programs more than females is similar

to findings in other studies (Smith, 1999; Jones, 1998). In today’s society many boys feel

pressured to excel at sports. This might be explained partially from the modeling of adult

males who watch sports programs on television.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Table 8.2a1

Chi-square Analysis of Responses to Watching Sports and Gender Gender Yes No χ2 p Males 24 2 35.26 <.001 Females 2 22

Example of How to Write About Problem 8.2b Results

Chi-square analyses in Table 8.2b indicated significant differences between males

and females in the proportions who watch sitcoms, movies, and sports, but not news.

Females were more likely than males to watch sitcoms and movies. Males were more

likely than females to watch sports. The effect sizes were very large for movies and

sports.

Table 8.2b Chi-square Analyses of “Yes” Responses to Type of Television Viewed and Gender Gender Males Female χ2 p Type of show (n=26) (n=24) Sitcoms 11 21 11.06 .001 Movies 0 18 30.47 <.001 Sports 24 2 35.26 <.001 News 14 9 1.34 .247

1 One would not usually make a table for one chi-square. See table 8.2b for an example of a table with several related chi-squares. If you have both text and a table do not repeat the statistics (χ2 = 35.26, etc.) in the text.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

8.3. Is there an association between having children or not and watching TV sitcoms? 8.4. Is there a difference between students who have children and those who do not in

regard to their age group? 8.5. Compute an appropriate statistic and effect size measure for the relationship

between gender and evaluation of social life.

Selection of Statistic
How to Produce the Selected SPSS Output
To answer Problem 8.1a with Windows:
SPSS Output for Problem 8.1a

Crosstabs

How to Produce the Selected SPSS Output
To answer Problem 8.1b with Windows:

Crosstabs

Description of Output 8.1a and b
Example of How to Write About Problem 8.1a and b
Table 8.1
Chi-square Analyses of Prevalence of Gender and Age Among Single, Married, and Divorced Students

Marital Status
Gender 4.01 .135
Males 26 14 7 5
Age Group 23.17 <.001
Totals 49 20 18 11

Selection of the Statistic
How to Produce the Selected SPSS Output
Description of Output 8.2a
Results
Discussion

Chi-square Analysis of Responses to Watching Sports and Gender

Males 24 2 35.26 <.001

Example of How to Write About Problem 8.2b
Results
Chi-square analyses in Table 8.2b indicated significant differences between males and females in the proportions who watch sitcoms, movies, and sports, but not news. Females were more likely than males to watch sitcoms and movies. Males were more l...
Table 8.2b
Chi-square Analyses of “Yes” Responses to Type of Television Viewed and Gender

Gender
Movies 0 18 30.47 <.001
Sports 24 2 35.26 <.001

News 14 9 1.34 .247

Chapter9/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 9 – Correlation and Regression Study Guide

OBJECTIVES: The student will be able to:

1. Create and interpret scatterplots. 2. Explain the assumptions and conditions for the Pearson Correlation versus the Spearman

rho. 3. Compute and interpret the Pearson Correlation. 4. Compute and interpret the Spearman Rho. 5. Compute a correlation matrix to indicate the associations among the pairs of three or

more variables. 6. Compute and interpret Chronbach’s alpha as a measure of reliability. 7. Compute and interpret a bivariate regression. 8. Compute and interpret a multiple regression (Enter method). 9. Write about the results of the statistical tests performed in this chapter.

TERMINOLOGY: • scatterplot • linear regression line • Pearson product moment correlation (r) • Spearman’s rho (rs) • one-tailed test vs. two-tailed test • correlation matrix • Cronbach’s alpha • internal consistency reliability • inter-item correlation matrix • summated scale • bivariate (simple linear) regression • unstandardized coefficients • multiple regression • correlation coefficient (R) • Adjusted R2 • standardized beta coefficients ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter9/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 9 – Correlation and Regression Chapter Outline

I. Introduction

A. Correlation Values 1. Ranges from –1.0 to +1.0

a. 0 = no correlation (zero correlation) b. +1.0 = perfect positive correlation (as one value goes up the

other goes up) c. –1.0 = perfect negative correlation (as one value goes up

the other goes down) B. Assumptions and Conditions for the Pearson Correlation

1. The two variables have a linear relationship. 2. Scores on one variable are normally distributed for each value of

the other variable and vice versa. 3. Outliers can have a big effect on correlation.

C. Assumptions and Conditions for Spearman Rho 1. Data on both variables are at least ordinal. 2. Scores on a variable are monotonically related to the other

variable (as the value of one variable increases, the other should increase, but not necessarily in a linear fashion).

3. Rho is computed by ranking the data for each variable and then computing a Pearson product moment correlation.

II. Problem 9.1: Scatterplots to Check Assumptions A. Features of a scatterplot.

1. Plot or graph to show the association of two variables. 2. High positive correlations will plot a nearly straight line (linear

regression line) from the lower left corner of the plot to the upper right corner of the plot.

3. High negative correlations will plot a nearly straight line (linear regression line) from the upper left corner of the plot to the lower right corner of the plot.

B. Follow the directions in the book to create and interpret a scatterplot. III. Problem 9.2: Bivariate Pearson and Spearman Correlations

A. Pearson Product Moment Correlation 1. Bivariate parametric statistic 2. Used when both variables are approximately normally

distributed. B. Spearman Rho

1. Nonparametric equivalent of the Pearson correlation. 2. Use with ordinal data or if assumptions for the Pearson

correlation are violated. C. Follow the directions in the book to compute and interpret a Pearson

correlation and a Spearman rho. IV. Problem 9.3: Correlation Matrix for Several Variables

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

A. Follow the directions in the book to compute and interpret a correlation matrix. This is used when you have more than 2 ordinal or normally distributed variables that you want to correlate.

V. Problem 9.4: Internal Consistency Reliability with Cronbach’s Alpha A. Cronbach’s Alpha

1. Common measure of internal consistency reliability. 2. Useful for checking several scores that the researcher wants to

add together to obtain a summated score. 3. Based upon a correlation matrix and is interpreted similarly to

other reliability measures. a. Alpha should be positive. b. Value should be greater than .70 to provide support for

internal consistency reliability. B. Follow the directions in the book to compute and interpret Cronbach’s

Alpha. VI. Problem 9.5: Bivariate or Simple Linear Regression

A. Utilized to make predictions about the relationship between two normally distributed variables. (Also referred to as simple regression or simple linear regression).

B. Follow the directions in the book to compute and interpret a bivariate regression.

VII. Problem 9.6: Multiple Regression A. Utilized to make predictions about the dependent variable from a

combination of several predictor variables. 1. Can use a combination of interval, scale, and/or dichotomous

independent/predictor variables. 2. Can look for the combination of variables that provide the best

prediction. B. Assumptions and Conditions of Multiple Regression

1. The relationship between each of the predictor variables and the dependent variable should be linear.

2. Errors are normally distributed. 3. Variance of the residuals (the difference between the actual and

predicted scores) is constant. 4. Multicollinearity (condition where there is high intercorrelations

among some set of the predictor variables) is minimal. C. Follow the directions in the book to compute and interpret a multiple

regression.

Chapter9/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 9 – Correlation and Regression Using your College Student data do the following problems. Print your outputs after typing your interpretations on them. Please circle the key parts of the output that you discuss. 9.1. What is the correlation between students’ height and parent’s height? Also

produce a scatterplot. Interpret the results, including statistical significance, direction, and effect size.

Selection of the Statistic

The variables of interest for Problem 9.1 are height and parent’s height. Both of these

variables are scale. (We found in Chapter 5 that they were normally distributed.) The

best choice for a statistic when you are interested in investigating the relationship

between two scale variables is Pearson correlation. Pearson correlation produces an r.

The closer the r value is to +1.0, the stronger the positive the relationship that exists

between the two variables. If the r value is negative, this indicates a negative

relationship. For example, hours spent watching TV is negatively related to hours spent

studying (see Problem 9.2). This indicates that students who watch a lot of TV tend to

study less.

Assumptions of Correlation (Pearson r)

1. The two variables are linearly related (Pearson r will not detect a curvilinear relationship)

2. Scores on one variable are normally distributed for each value of the other variable and vice versa. (If degrees of freedom are greater than 25,

failure to meet this assumption has little consequence.)

Assumptions of Spearman Rho (correlation based on ranked data)

• Data on both variables are at least ordinal

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

How to Produce the Selected SPSS Output

To answer Problem 9.1 with Windows:

• Click on Analyze ⇒ Correlate ⇒ Bivariate. This will open the Bivariate Correlations window

• Highlight student height in inches and same sex parent’s height • Click on the arrow to move the variables into the Variable(s) box • Click on Options – This will open the Bivariate Correlations: Options window • Click on Means and Standard deviations and Exclude cases listwise • Click on Continue and O.K. To answer Problem 9.1 with syntax:

CORRELATIONS /VARIABLES=height pheight /PRINT=TWOTAIL NOSIG /STATISTICS DESCRIPTIVES /MISSING=LISTWISE .

SPSS Output for Problem 9.1

Correlations

Descriptive Statistics

67.3000 3.9396 50 66.7800 5.1042 50

student height in inches same sex parent's height

Mean Std. Deviation N

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Correlationsa

1 .842** . .000

.842** 1

.000 .

Pearson Correlation Sig. (2-tailed) Pearson Correlation Sig. (2-tailed)

student height in inches

same sex parent's height

student height in inches

same sex parent's height

Correlation is significant at the 0.01 level (2-tailed).**.

Listwise N=50a.

To create the graph for Problem 9.1 with Windows:

• Click on Graphs ⇒ Scatter… This will open the Scatterplot window • Click on Simple and then Define • Highlight same sex parent’s height • Click on the arrow to move it into the Y Axis box • Highlight student height in inches • Click on the arrow to move it into the X Axis box • Click on Titles. This will open the Title box • Type “Correlation of height and “ in Line 1 under Title • Type “same sex parents’ height” in Line 2 under Title • Click on Continue and O.K. • To get the regression line – Double click on the Scatterplot. This will make the SPSS

Chart Editor appear • Click on Chart ⇒ Options • Click on Total under Fit • Click on Continue and O.K. • Close the SPSS Chart Editor box To answer the graph for Problem 9.1 with syntax:

This table gives the correlations (or the relationship) between the variables.

Remember to only look at one side of the diagonal.

Another word for relationship

This line gives the exact significance level to three decimals.

The ** indicate the correlation is significant.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

GRAPH /SCATTERPLOT(BIVAR)=height WITH pheight /MISSING=LISTWISE .

Graph

Correlation of height and

same sex parents' height

student height in inches

76747270686664626058

sa m

e s

e x

p ar

e n

t's h

e ig

50 Rsq = 0.7084

Description of Output 9.1 The first table shows the mean, standard deviation and N for each variable. The

second table presents the correlations. Remember to only look on one side of the

diagonal, as the information is redundant. The top number in the correlation matrix (in

this case it is .842) is the value of r. The second number is the p value (in this case .000),

and the number at the bottom is the N, which was included in the analysis. The graph

shows the relationship between the two variables. R squared indicates the amount of

variance in same sex parent’s height that can be predicted from student height in inches.

It is helpful to look at the regression line and which way it goes. The line in this graph shows a positive correlation because its slope is from lower left to the upper right of the plot.

This is the r-squared value. It tells us that 71 percent of the variance in parent’s height can be predicted from student height.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Example of APA Results and Discussion for Problem 9.1

Results

There was a statistically significant correlation between student height in inches

and same sex parent's height, r (48) = .842, p < .001. The direction of the correlation was

positive, which means that tall parents tend to have tall children, and vice versa. Using

Cohen's (1988) guidelines, the r indicates a very large effect size. The r squared

indicates that 71% of the variance in student height in inches can be predicted from the

same sex parent's height.

9.2. Write a question that can be answered via correlational analysis with two

approximately normal or scale variables. Run the appropriate statistics to answer the question. Interpret the results.

Selection of the Procedure

There are several variables you can choose to answer this problem. Any two variables

that are scale would be appropriate. If you are unsure what the measurement is for a

variable, you can check the codebook in Appendix B, or look on the Variable View

Screen (the last column will list the measurement type). If one or both variables in a

correlation is ordered but not scale (ranks or severely non normal), you can use the

Spearman correlation (rho).

How to Produce the Selected SPSS Output (See Problem 9.1)

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

SPSS Output for Problem 9.2

Correlations

Descriptive Statistics

15.62 8.31 50

11.98 6.10 50

hours of study per week amount of tv watched per week

Mean Std. Deviation N

Correlations

1.000 -.316* . .025

50 50 -.316* 1.000 .025 .

50 50

Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N

hours of study per week

amount of tv watched per week

hours of study per week

amount of tv watched per week

Correlation is significant at the 0.05 level (2-tailed).*.

Description of Output 9.2

When viewing an output of correlations, first check to see if the N’s are the same for each

variable; only persons with data for both variables will be included in the correlation.

Note that correlations assess the strength of the association between two variables. Then

check the correlations table. Remember to only look at one side of the diagonal since the

information is repetitive. The 1.000’s are correlations of the variable with itself.

To help yourself to remember to look at each relationship only once, it can be helpful to draw a line similar to this on your output. Note that the correlation (-.32) is the same above and below the line.

Notice this significance level. Two asterisks (**) means p<.01 and one (*) means p<.05.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Example of How to Write About Problem 9.2

Results

There was a statistically significant negative correlation between hours of study

per week and amount of TV watched per week, r (48) = -.32, p = .025. The direction of

the correlation was negative, indicating that students who study a lot tend not to watch a

lot of TV. Using Cohen's guidelines, the r = -.32 indicates a medium effect size.

9.3. Make a correlation matrix using at least four appropriate variables. Identify, using the variable names, the two strongest and two weakest correlations. What were the r and p values for each correlation?

Selection of the Statistic For Pearson correlations to be appropriate both variables should be scale level (ordered

and normally distributed). Thus, you should select four scale variables to use when

answering this problem.

How to Produce the Selected SPSS Output (See Problem 9.1)

SPSS Output for Problem 9.3

Correlations

Descriptive Statistics

11.98 6.10 50

15.62 8.31 50 3.172 .391 50

26.12 14.86 49

amount of tv watched per week hours of study per week student's current gpa hours per week spent working

Mean Std. Deviation N

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Correlations

1.000 -.316* -.253 -.541** . .025 .076 .000

50 50 50 49 -.316* 1.000 .109 .219 .025 . .453 .131

50 50 50 49 -.253 .109 1.000 .303* .076 .453 . .034

50 50 50 49 -.541** .219 .303* 1.000 .000 .131 .034 .

49 49 49 49

Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N

amount of tv watched per week

hours of study per week

student's current gpa

hours per week spent working

amount of tv watched per week

hours of study per week

student's current gpa

hours per week spent

working

Correlation is significant at the 0.05 level (2-tailed).*.

Correlation is significant at the 0.01 level (2-tailed).**.

Description of Output 9.3

The Descriptive Statistics table indicates that hours per week spent working has only 49

observations. The other three variables have 50.

In the second table, called a correlation matrix, each of the four variables is correlated

with the other three. Notice, in the first column, that the amount of TV watched per week

is significantly negatively correlated with hours of study per week (r = -.316, p = .025)

and with hours per week spent working (r = -.541, p = .001), which is the strongest

correlation in the matrix. However, amount of TV is not significantly correlated with

students GPA (r = -.253, p = .076) because the p or Sig. is greater than .05.

The boxes below the diagonal in the hours of study per week column indicate that it is

not significantly correlated with student’s GPA (r = .109, p = .453), the weakest in the

matrix, or with hours per week spent working (r = .219, p = .131). In addition, student’s

current GPA is significantly positively correlated with hours per week spent working (r =

Exact significance level, p

Notice that two levels of significance (p<.01 and p<.05) are marked with asterisks.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

.303, p = .034). The number of subjects with data for each pair of variables (49 or 50, in

this matrix) is indicated by the Ns. The degrees of freedom (df) are N-2.

Example of How to Write About Problem 9.3

Results

Table 9.3 shows that three of the six pairs of variables were significantly

correlated. The strongest correlation, with a large effect size, was between amount of TV

watched per week and hours per week spent working, r (47) = -.541, p < .001. This

means that students who had a relatively high number of hours watching TV were likely

to spend relatively few hours working. Likewise, TV watching was negatively correlated

with studying, but amount of work and GPA were positively correlated. As shown in

Table 9.3 both of these correlations were of a medium size according to Cohen (1988).

Table 9.3 Intercorrelations, Means and Standard Deviations for Scale Variables Variable 1 2 3 4 M SD 1. TV watched -- -.32* -.54* -.25 11.98 6.10 2. Study -- -- .22 .11 15.62 8.31 3. Work -- -- -- .30* 26.12 14.86 4. GPA -- -- -- -- 3.17 .39 * p < .05

9.4. Is there a combination of gender and same sex parent’s height that predicts student’s height better than either one of these variables alone?

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Selection of the Statistic Problem 9.4 is asking a prediction question; “What variables significantly predict

student’s height?” When asking this type of question, multiple regression is an

appropriate choice for a statistic. Multiple regression is also used to answer other

questions when there are multiple independent variables that are scale or dichotomous

and one scale dependent variable.

Key elements needed:

• Two or more predictor variables (another name for independent variables) that are

scale (in this case, same sex parent’s height) or dichotomous (in this example, gender).

• A response variable (another name for dependent or outcome variable) which is scale

(in this case student’s height)

• A linear (approximately straight line) relationship needs to exist between the

predictors and response/outcome

Assumptions of Regression

1. The relationship between the independent variables and the dependent variable is linear (in other words, the regression of the DV on the

combination of IVs is linear).

2. There are several other more complex assumptions not discussed here.

How to Produce the Selected SPSS Output To answer Problem 9.4 with Windows:

• Click on Analyze ⇒ Regression ⇒ Linear. This will open the Linear Regression window.

• Highlight student height in inches and move it into the Dependent box • Highlight same sex parent’s height and gender and move them into the

Independent(s) box • Click on Statistics. This will open the Linear Regression: Statistics window • Select Descriptives • Click on Continue and O.K. How to answer Problem 9.4 with syntax:

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT height /METHOD=ENTER pheight gender. SPSS Output for Problem 9.4

Regression

Descriptive Statistics

67.3000 3.9396 50 66.7800 5.1042 50

1.48 .50 50

student height in inches same sex parent's height gender of student

Mean Std. Deviation N

Correlations

1.000 .842 -.782 .842 1.000 -.782

-.782 -.782 1.000 . .000 .000

.000 . .000

.000 .000 . 50 50 50 50 50 50 50 50 50

student height in inches same sex parent's height gender of student student height in inches same sex parent's height gender of student student height in inches same sex parent's height gender of student

Pearson Correlation

Sig. (1-tailed)

student height in inches

same sex parent's height

gender of student

See Chapter

High correlations among predictors might affect the results.

These are the correlations of the dependent variable with each of the predictors.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Variables Entered/Removedb

gender of student, same sex parent's height

. Enter

Model 1

Variables Entered

Variables Removed Method

All requested variables entered.a.

Dependent Variable: student height in inchesb.

Model Summary

.865a .748 .737 2.0196 Model 1

R R Square Adjusted R Square

Std. Error of the Estimate

Predictors: (Constant), gender of student, same sex parent's height

ANOVAb

568.794 2 284.397 69.725 .000a

191.706 47 4.079 760.500 49

Regression Residual Total

Model 1

Sum of Squares df Mean Square F Sig.

Predictors: (Constant), gender of student, same sex parent's heighta.

Dependent Variable: student height in inchesb.

The ANOVA or F statistic

Indicates the F statistic’s significance level. If significant (<.05), then the predictors are significantly predicting the dependent variable.

Indicates what method was used to enter the predictor variables. There are many methods that can be used. We will only use the “Enter” or simultaneous method.

All predictors simultaneously entered in the analysis.

The dependent variable

These are the predictor variables.

Significance of each predictor and the constant.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Coefficientsa

40.466 7.176 5.639 .000 .457 .091 .592 5.038 .000

-2.491 .917 -.319 -2.715 .009

(Constant) same sex parent's height gender of student

Model 1

B Std. Error

Unstandardized Coefficients

Beta

Standardi zed

Coefficien ts

t Sig.

Dependent Variable: student height in inchesa.

Description of Output 9.4

The table of Descriptives shows the number (N) for each variable. In multiple regression,

SPSS uses only the participants with no missing data on the selected variables. In this

example there is no missing data, which is indicated by the N’s being equal to 50.

After the table of Descriptive statistics, there is table of correlations which shows how

each variable is related to all the others. The first column (or top row) shows the

correlations of each predictor with the dependent/outcome variable. It is desirable to

have high correlations between each predictor and the outcome variable. In this example

student height in inches is highly correlated with same sex parent’s height (r = .842) and

with gender of student (r = -.782). This is good. It is not good to have high correlations

between predictor variables because this might indicate the presence of multicollinearity.

If there are high correlations among the predictor variables, then the VIF or Tolerance

scores should be checked. In this example, there is a high correlation between the two

predictors, same sex parent’s height and gender of student, (r = -.782). Checking the VIF

score (not shown here) indicated that multicollinearity was not a problem (see Mertler

and Vannatta, 2001, for more on how to do this).

The third table shows which variables are predictors and dependent and the method used

to run the multiple regressions. In this case, the method used was “Enter” which enters

all the variables at once into the equation. Next the Model Summary table provides

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

information on how much variance in the dependent variable can be predicted from the

combination of the predictors (see adjusted R2). In this example, adjusted R2 = .737

indicates that 74% of the variance in student height in inches can be predicted from the

combination of gender of student and same sex parent’s height. The ANOVA table

provides an overall indication of whether this combination is statistically significant, in

this example p < .001 which tells us that the model significantly predicts the dependent

variable.

Finally, the Coefficients table is a key table. The unstandardized beta (B) and standard

error would be used if you want to write out the model or equation to predict an

individual student’s height given his or her gender, same sex parent’s height, and the

constant. In this example the model would be written as student height = 40.47 + .46*p

height –2.47*gender. Remember to multiply variables before adding or subtracting. The

value for a person’s gender (1 = male or 2 for female) should be multiplied by 2.47 and

then subtracted from 40.47 + .46 times that person’s parent’s height in inches. The

standardized betas (β) are similar to correlation coefficients. The t and Sig. values

indicate whether the betas are significant predictors, assuming that the other variables are

in the model. In this case both betas are statistically significant which indicates that both

variables significantly contribute to the model. If a beta (β) is not significant, you might

want to consider not including it in the model, although doing so might cause other

currently significant predictors to become non significant. An indication of the effect

size can be obtained from the multiple correlation coeffiecients: they can vary from +1.0

to –1.0. R equals 87, which according to Cohen’s (1988) guidelines is a large effect.

The effect sizes for multiple regression are similar to those for r and are shown on page

21.

Example of How to Write About Problem 9.4

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Results

Multiple regression was conducted to investigate the best predictors of student height. The assumptions of multiple regression were checked and none were violated. When the combination of variables to predict student height included same sex parent’s height and gender, F(2, 49) = 69.73, p < .001, student height = 40.47 + .46* p height – 2.49* gender. For example, this indicates that if p height equals 70 (5’10”) the son’s height would be predicted to be 70.18 (40.47 + .46 *70 – 2.49*1). The adjusted R squared value was .737. This indicates that 74% of the variance in student height was explained by the model. According to Cohen (1988) this is a large effect.

Table 9.4a

Means, Standard Deviations, and Intercorrelations for Student Height in Inches and Predictors Variables (N=50)

Variable

Parent’s height

Gender

Student Height 67.3 3.94 .842* -.782* Predictor variable 1. Same sex parent’s height

66.8 5.10 -- -.782*

2. Gender 1.48 .50 -- *p < .001.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Table 9.4b

Simultaneous Multiple Regression Analysis Summary for Gender and Same Sex Parent’s Height Predicting Student’s Height in Inches (N = 50) Variable B SEB β Same sex parent’s height .46 .09 .59* Gender -2.49 .92 -.32* Constant 40.47 7.18 Note. R2 = .75; F(2,47 ) = 69.73, p < .001. *p < .01.

Discussion

From the data in this sample, it appears that one can predict student height from

the same sex parent’s height and gender very well. This study indicates that once a

pregnant woman knows the sex of her child, the height of the child can be predicted from

the child’s gender and same sex parent’s height. Jones (1998) has demonstrated this in

previous studies.

9.5. Is there a combination of hours of TV watching, hours of studying, and hours of

work that predicts current GPA?

Selection of the Statistic
To answer Problem 9.1 with Windows:
SPSS Output for Problem 9.1
To create the graph for Problem 9.1 with Windows:
Selection of the Procedure

Correlations

Description of Output 9.2
Selection of the Statistic
How to Produce the Selected SPSS Output
Description of Output 9.3
Table 9.3
Intercorrelations, Means and Standard Deviations for Scale Variables

1. TV watched -- -.32* -.54* -.25 11.98 6.10
2. Study -- -- .22 .11 15.62 8.31
Selection of the Statistic
How to Produce the Selected SPSS Output
To answer Problem 9.4 with Windows:

Regression

Description of Output 9.4
Results
Table 9.4a
Means, Standard Deviations, and Intercorrelations for Student Height in Inches and Predictors Variables (N=50)
Simultaneous Multiple Regression Analysis Summary for Gender and Same Sex Parent’s
Height Predicting Student’s Height in Inches (N = 50)

M
SEB
Discussion

Chapter10/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 10 – Comparing Groups with t Tests and Similar Nonparametric Tests Study Guide

OBJECTIVES: The student will be able to:

1. Select the appropriate inferential statistic to compare groups (Table 10.1) 2. Compute and interpret a one-sample t test to compare one group or sample to a

hypothesized population mean. 3. Explain the assumptions and conditions for use of the independent samples t test. 4. Compute and interpret an independent samples t test to compare two independent groups

(between groups design). 5. Explain the assumptions and conditions for use of the Mann-Whitney U test. 6. Compute and interpret a Mann-Whitney U test to compare two groups. 7. Explain the assumptions and conditions for use of a paired samples t-test. 8. Compute and interpret a paired samples t test to compare groups in a within subjects

design. 9. Compute and interpret a paired samples t test to check the reliability of a repeated

measure. 10. Compute and interpret a Wilcoxon nonparametric test for a within subjects design. 11. Write about the results of the statistics performed in this chapter.

TERMINOLOGY: • between groups design • within subjects design • one-sample t test • independent samples t test • Levene test • equal variances assumed vs. equal variances not assumed • 95% confidence interval of the difference • Mann-Whitney U test • ranks • paired samples t test • test-retest (parallel forms) reliability • Wilcoxon (nonparametric test for two related samples) ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter10/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 10 – Comparing Groups with t Tests and Similar Nonparametric Tests Chapter 10 Outline

I. Selection of an Appropriate Inferential Statistic for Basic, Two Variable

Difference Questions or Hypotheses A. Utilize Table 10.1 to help select appropriate inferential statistic

1. Determine if the design is between groups versus within subjects. 2. Determine the level of measurement. 3. Determine if the assumptions are met versus markedly violated. 4. Match the appropriate row and column to select the statistic.

II. Problem 10.1: One-Sample t Test A. A one-sample t test is used to compare the mean of a sample with a

hypothesized population mean. B. Follow the directions in the book to compute and interpret and one-sample

t-test. III. Problem 10.2: Independent Samples t Test

A. An independent samples t test is used to investigate the difference between two unrelated groups on an approximately normal dependent variables.

B. Assumptions of the independent samples t test 1. The variances of the dependent variable in the two populations

are equal. 2. The dependent variable is normally distributed within each

population. 3. The data are independent (the scores of one participant are not

dependent upon the scores of the other participants). C. Testing the assumptions

1. SPSS will automatically test assumption 1 using the Levene test for equal variances.

2. Assumption 2 can be tested using the Explore command. 3. Assumption 3 must be addressed during design and data

collection. D. SPSS can do several t tests in one output if they are all using the same

independent or grouping variable. E. Follow the directions in the book to compute and interpret an independent

samples t test. IV. Problem 10.3: The Nonparametric Mann-Whitney U Test

A. Use of the Mann-Whitney U Test 1. For between groups design with two levels of the independent

variable. 2. Can be used with ordinal data 3. Necessary for groups and scores to be independent. 4. Use when the assumptions for the independent samples t test are

markedly violated. B. Follow the directions in the book to compute and interpret a Mann-

Whitney U test. V. Problem 10.4: Paired Samples t Test

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

A. Use of the Paired Samples t Test 1. Use if the groups are not independent (e.g. parent and child). 2. Use if the two scores are a repeated measure. 3. Use for a pretest-posttest design.

B. Assumptions and Conditions of Use of Paired Samples t Test 1. The independent variable is dichotomous and its levels (or

groups) are paired, or matched, in some way. 2. The dependent variable is normally distributed in the two

conditions. C. Follow the directions in the book to compute and interpret a paired

samples t Test. VI. Problem 10.5: Using the Paired t Test to Check Reliability

A. Reliability Check 1. Test-retest reliability 2. Parallel (equivalent) forms reliability 3. Reliability can be tested by producing a correlation coefficient,

but the paired t test produces a correlation as well as a comparison of the means.

B. Follow the directions in the book to compute and interpret a paired t test to check reliability.

VII. Problem 10.6: Nonparametric Test for Two Related Samples (Wilcoxon) A. Use when the assumptions of the paired t test are violated. B. Follow directions in the book to compute and interpret the Wilcoxon

Signed Ranks Test.

Chapter10/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 10 – Comparing Groups with t Tests and Similar Nonparametric

Tests Using the College Student data file, do the following problems. Print your outputs after typing your interpretations on them. Please circle the key parts of the output that you use for your interpretation. 10.1. Is there a significant difference between the genders on average student height?

Explain. Provide a full interpretation of the results. In this case the first question to ask yourself is whether the dependent variable (student

height) is normally distributed. You can find this by running either a frequeincies or descriptives test. Based upon this finding, you can determine which statistic to select from the SPSS menus. If it is normally distributed the independent t-test is the test we recommend. If the dependent variable is not normally distributed, then the Mann- Whitney is the better choice. How do you decide what normal is?

If the skewness from the descriptive or frequency printout is equal to greater than one,

then we recommend using a non-parametric. If it is less than one (say .87) then you would select the parametric statistic (in this case the independent t test).

With difference tests it is very important to review the descriptive table in the printouts.

First, review the means. Does it look like there is a large difference in the means? This process will let you estimate what the difference statistic (parametric or non parametric) and sig values should logically be. Similarly, the standard deviations tell you whether there is a small difference in the distribution of scores or (just by eye balling it) is there a large difference. You should find that when there is a relatively small difference between means, the t or Mann-Whitney sig column should show a large sig. value. Usually there is a high chance of getting these results by chance. In other words, statistically speaking, there is little change that there is a real difference in the means. More importantly, there is little difference that the independent variable was the reason for the outcome.

This case gets a bit more difficult because there is a large difference in the standard

deviations of two levels of each the independent variable. One has a value of 2.80 and the other 2.07. Although that might sound confusing, SPSS makes it easy for you. If the distributions are significantly different than you read the bottom line of the SPSS printout. If the distributions are like most, (not a large difference between the standard deviations) then you read the top line (equal variance assumed). If there is a big difference the Levene’s test will tell you that in much the same way as every test we have discussed. If there is a p or sig value less than .05 then there is a difference between the two distributions.

This concept is one of the most difficult to understand in this chapter. It’s easy to run the

statistics, but can be difficult to read the outputs. Look first for if the Levene’s test is significant or not (note the headings on the top of the table). THEN you interpret the “t- test equivalence of means” columns. If Levene’s is less than .05 then you read the bottom line. If it is more than .05 (which is the norm), then you read the top line.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

3. Write another question that can be answered from the data using a paired t test. Run the t test and provide a full interpretation.

The problem here is understanding that paired t test are used in very particular situations.

In this data set there are not many options to choose from. You should use of the paired t to compare means that use repeated measure (test – re test or pretest posttest). However, there are other factors that would cause one to believe that you would expect differences and perhaps we should be a bit more conservative in the interpretation of the results. One of those is a genetic link. In this case we have data related to height and same sex parent height. There is good reason to believe that there would be a strong correlation between those two variables. Hint Hint!

Selection of the Statistic The choice of a statistic for Problem 10.1 has three important considerations:

• One dichotomous/nominal independent variable (gender)

• One scale (normally distributed) dependent variable (student height)

• The problem is to compare the average (or mean) DV scores of two

independent groups (IV)

When investigating the difference between two unrelated groups (in this case males and

females) on a scale variable (in this case it is average student height) it is appropriate to

choose an independent samples t test.

Assumptions of the t test

1. Variances of the two populations are equal 2. The dependent variable is normally distributed within each population 3. The data for the two groups are independent (the groups are not matched

or related)

4. Groups are of similar size

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

How to Produce the Selected SPSS Output

To answer Problem 10.1 with Windows:

• Click on Analyze ⇒ Compare Means ⇒ Independent Samples T Test • Highlight student height in inches (the scale variable) and move it into the Test

Variable(s) box by clicking on the arrow. • Highlight gender (the dichotomous variable) and move it into the Grouping Variable

box • Click on Define Groups. This will open the Define Groups window • Type a 1 in the Group 1 box (this is because the value for males is 1) • Type a 2 in the Group 2 box (this is because the value for females is 2). If we had

used different values for the levels of the variables, we would need to use those values

• Click on Continue and O.K. To answer Problem 10.1 with syntax: T-TEST GROUPS=gender(1 2) /MISSING=ANALYSIS /VARIABLES=height /CRITERIA=CIN(.95) .

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

SPSS Output for Problem 10.1

T-Test

Group Statistics

26 70.2308 2.8044 .5500 24 64.1250 2.0708 .4227

gender of student males females

student height in inches N Mean Std. Deviation

Std. Error Mean

Independent Samples Test

1.230 .273 8.697 48 .000 6.1058 .7020 4.6942 7.5173

8.802 45.863 .000 6.1058 .6937 4.7094 7.5021

Equal variances assumed Equal variances not assumed

student height in inches F Sig.

Levene's Test for Equality of Variances

t df Sig. (2-tailed) Mean

Difference Std. Error Difference Lower Upper

95% Confidence Interval of the

Difference

t-test for Equality of Means

Variable name The number in each group should be approximately the same. Levels

The variances are the standard deviations squared; i.e., approximately 7.84 for males and 4.28 for females. The Levene’s test in the next table checks to see if the variances are significantly different. In this case they are not.

This is the first place to look. These two columns indicate if the assumption that the variances are equal is violated. If Sig. is < .05, it is assumed that the variances are not equal, thus use the bottom line for the t test. In this case Sig. >.05 so use the top line.

Notice that there are really 2 tests here – the Levene’s Test for Equality of Variances and the t test.

This is the t value.

This is the significance level of the t value. If Sig. is <.05, then there is a difference between the 2 groups.

This is the difference between the means of the two groups – the larger this is, the more likely the t test will be significant, given a certain standard Error of the Difference.

Ignore the “equal variances not assumed” line in this case because the Levene’s test indicated that the variances were not significantly different (p>.05). Note we crossed out this line of numbers.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Description of Output 10.1 The first table shows the mean height of the two samples or groups (males and females)

to be compared and other information about each sample. The second table has two

parts. The first two columns are the assumption of the Levene's Test for equality of

variances; it is not significant, (F=1.23, p=.27) so equal variances are assumed. The

seven columns on the right refer to the two versions of the t test. The top line contains

the t that is used when variances are approximately equal. Note that the confidence

interval (lower limit = 4.69, upper limit = 7.52) shows that if we did the study 100 times,

95 times the true (population) difference would fall within this interval, so we can be

quite confident that the difference in heights is at least 4.69 inches. The approximate

effect size (d) can be computed using the formula d = Mean difference/ SD pooled. In

this case the mean difference is 6.11 and the pooled standard deviation is approximately

2.45 (the average of the males and female SD’s with males weighted a little more because

there are more of them). Then 6.11 ÷ 2.5 = approximately 2.5; which is d, a very large

effect size. To calculate d more exactly, you could use the following formula:

2-n+n SD1)-n(+SD1)-n(

M-M = d

2 BB

2 AA

Example of How to Write About Problem 10.1

Results

An independent samples t test was executed comparing the genders on average

student height in inches. A statistically significant difference was found, t (48) = 8.70, p

< .001, which indicates that the average male (M =70.23) is significantly taller than the

average female (M = 64.13). The 95% confident interval indicates that the mean

difference between male and female students in the population is probably between 4.69

and 7.52. The effect size (approximately 2.5) is very large according to Cohen (1988).

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Discussion

One research question examined was whether males and females differed

significantly in height. Males were found to be much taller than females. This could be

explained by the fact that throughout history, males have been taller than females,

probably due to genetic differences between the sexes (Smith, 1983). There is some

evidence that over the last century both males and females became taller and that the

differences between males and females are less now (Jones, 1995, Smith, 1983). Data

from the Jones study, compared to the current data, seem to indicate that the gender

differences in the U.S. population are considerably smaller than in the current sample.

Table 10.1a

Group Differences for Student Height in Inches for Male and Female Students

Male n=26 Female n=24 M SD M SD t (48) p

Student height in inches

70.23

2.80

64.13

2.07

8.70

<.001

aUsually you would not make a table with only one t test, but if you did, it might look like this. The results written here assume that the table was not included.

10.2. Is there a difference between the number of hours students study and the hours

they work? Also, is there an association between the two? 10.3 Write another question that can be answered from the data using a paired sample t

test. Run the t test and provide a full interpretation. Selection of the Statistic Key points for the paired samples t test

Mean Standard Deviation Degrees of Freedom

When the p value on the SPSS printout is .000 you should report it as p <.001.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

• Need to have a dichotomous independent variable that is paired, or matched,

in some way (e.g., husband-wife, pre-post, etc.).

• Need a scale dependent variable for which you are interested in the mean

score.

How to Produce the Selected SPSS Output To answer Problem 10.3 with Windows:

• Click on Analyze ⇒ Compare Means ⇒ Paired Samples t test. This will open the Paired Samples t test window

• Highlight student height in inches and same sex parent’s height • Click on the arrow to move them into the Paired Variable(s) box • Click on Continue and O.K. To answer Problem 10.3 with syntax: T-TEST PAIRS= height WITH pheight (PAIRED) /CRITERIA=CIN(.95) /MISSING=ANALYSIS.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

SPSS Output for Problem 10.3

T-Test

Paired Samples Statistics

67.3000 50 3.9396 .5571 66.7800 50 5.1042 .7218

student height in inches same sex parent's height

Pair 1

Mean N Std. Deviation Std. Error

Mean

Paired Samples Correlations

50 .842 .000 student height in inches & same sex parent's height

Pair 1

N Correlation Sig.

Paired Samples Test

.5200 2.7792 .3930 -.2698 1.3098 1.323 49 .192 student height in inches - same sex parent's height

Pair 1

Mean Std. Deviation Std. Error

Mean Lower Upper

95% Confidence Interval of the

Difference

Paired Differences

t df Sig. (2-tailed)

Description of Output 10.3

The first table provides descriptive information about two repeated or paired scores for

each participant. Or, as in this case, the same type of measure (height) for two subjects

who are paired (e.g., student and parent) or matched. The purpose of the paired t is to

compare the two means to see if they are significantly different. When using your own

This is the significance level of correlation.

This indicates the strength of the relationship between the heights of student-parent pairs. This is not the t test; it provides different information. See Problems 10.1 – 110.3.

This is the paired or related samples t.

This is significance of the t test.

Size of the difference between the means.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

data it is important to remember to use a paired t test if there is pairing or matching

between the two levels of the independent variable.

The second table provides additional information, the correlation between the two

variables. It does not tell you whether the means in the first table are different. See

Problems 10.1 to 10.3 for what it does tell you.

The right side of the last table shows the paired t test, degrees of freedom, and whether

the difference between the two means is significant. It is not because p = .192. The

confidence interval (-.27 to 1.31), which is presented in the middle of the last table, also

indicates that the t is not significant because it includes zero.

Example of How to Write About Problem 10.3

Results

A paired samples t test was performed to compare student height in inches with

same sex parent’s height. No significant difference was found, t (49) = 1.32, p = .192.

This indicates that parents have children who are about the same height as they are.

Discussion

The finding that students are about the same height as their parents was somewhat

unexpected. Research in this area has shown that over the past 100 years children, in

general, had better nutrition and health than their parents did as children (Smith & Jones,

1999). Research also indicated that on average children were taller and weighed more

than children did 100 years ago (Bass, 1998). However, Bass provided some evidence

that generational increases in height have leveled off in the last 20-30 years. Our finding

agreed with Bass, but also might be due to sampling problems.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Table 10.3a

Group Differences for Height Between Matched Groups

M SD t (49) p Student height in inches

67.30

3.94

1.32

.192 Same sex parent’s height 66.78 5.10 a Usually you wouldn’t make a table with only one t test, but if you did, it might look like this.

Selection of the Statistic
To answer Problem 10.1 with Windows:
SPSS Output for Problem 10.1

T-Test

Example of How to Write About Problem 10.1
Results

Table 10.1a

Group Differences for Student Height in Inches for Male and Female Students

Selection of the Statistic
How to Produce the Selected SPSS Output
To answer Problem 10.3 with Windows:

T-Test

Description of Output 10.3
Example of How to Write About Problem 10.3

Female n=24
Male n=26
SD
M
SD
M
Group Differences for Height Between Matched Groups

Chapter11/Chapter Guides.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 11 – Analysis of Variance (ANOVA) Study Guide

OBJECTIVES: The student will be able to:

1. Explain the assumptions and conditions for ANOVA. 2. Compute and interpret a one-way (single factor) ANOVA. 3. Select appropriate post-hoc tests for a statistically significant one-way ANOVA. 4. Compute and interpret post-hoc tests to determine which pairs of means from the one-

way ANOVA are significantly different. 5. Compute and interpret the Kruskal-Wallis test (a non-parametric test similar to a one-way

ANOVA). 6. Compute and interpret a two-way (factorial) ANOVA. 7. Write about statistical tests performed in this chapter.

TERMINOLOGY: • one-way ANOVA • ANOVA F (omnibus F or overall F) • post-hoc multiple comparisons

o LSD (least significant differences) test o Scheffe test o Tukey HSD (honestly significant differences) test o Games-Howell test

• Kruskal-Wallis H test • two-way (factorial) ANOVA ASSIGNMENTS: See additional activities and extra SPSS problems for assignment examples.

Chapter11/Chapter Outlines.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 11 – Analysis of Variance (ANOVA) Chapter Outline

I. Problem 11.1: One-Way (or Single Factor) ANOVA

A. Use of the One-Way ANOVA 1. Compares two or more independent groups on the dependent

variable. 2. Compares the means of the samples or groups in order to make

inferences about the population means. 3. Also called a single factor ANOVA because there is only one

independent variable or factor (the IV has nominal levels or a few ordered levels).

B. Assumptions of the ANOVA 1. Observations are independent (the value of one observation is not

related to any other observation). 2. Variances of the dependent variable are equal across the groups. 3. The dependent variable is normally distributed for each group.

C. Robustness of ANOVA 1. Can be used when variances are only approximately equal if the

number of subjects in each group are equal. 2. Can be used if the dependent variable data are even

approximately normally distributed. 3. Post hoc test selection will be dependent upon whether the

assumption of equal variances has been violated. D. Follow the directions in the book to compute and interpret a one-way

ANOVA. II. Problem 11.2: Post Hoc Multiple Comparison Tests

A. Purpose of post hoc multiple comparison tests (follow-up tests). 1. Utilized when you compare three or more group means and the

overall (omnibus) F is statistically significant. 2. Allows you to determine which pairs of means are significantly

different. 3. You do not do a post hoc test if the overall F is not statistically

significant. B. Characteristics of various post hoc tests.

1. LSD (least significant differences) post hoc test is quite liberal. 2. Scheffe test is quite conservative. 3. Tukey HSD (honestly significant differences) post hoc test is a

middle of the road test if the Levene test was not significant. 4. Games-Howell test is a middle of the road test if the Levene test

was significant. C. Follow the directions in the book to compute and interpret a post hoc test

for a one-way ANOVA. III. Problem 11.3: Nonparametric Test (Kruskal-Wallis)

A. Use of the Kruskal-Wallis 1. Use if the homogeneity of variance assumption is violated.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

2. Use if data is ordinal. 3. Use if the assumption of normality is violated.

B. Follow the directions in the book to compute and interpret a Kruskal- Wallis test.

IV. Problem 11.4: Two-Way (or Factorial) ANOVA A. Purpose of a two-way ANOVA

1. Allows comparison of two or more groups based on two independent variables.

2. Appropriate for a between groups design. B. Follow the directions in the book to compute and interpret a two-way

ANOVA.

Chapter11/Extra SPSS Problems.pdf

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Chapter 11 – Analysis of Variance Using your College Student data file, do the following problems. Print your outputs after typing your interpretations on them. Please circle the key parts of the output that you use for your interpretation. 11.1. Identify an example of a variable measured at the scale/normally distributed level

for which there is a statistically significant overall difference (F) between the three marital status groups. Complete the analysis and interpret the results. Do appropriate post hoc tests.

11.2. Use the Kruskal-Wallis test, with Mann-Whitney post hoc follow-up tests if

needed, to run the same problem as 11.1. Compare the results. 11.3 Do gender and marital status seem to have an effect on student’s height and do

gender and marital status interact? Run the appropriate SPSS analysis and interpret the results.

Selection of the Statistic Another type of ANOVA, factorial univariate analysis of variance is an appropriate

statistic to choose to analyze Problem 11.3. The interaction between the independent

variables will be graphed in the so-called Profile Plot. This plot can help you see what’s

going on with the variables.

Key elements include:

• Two independent variables each with a few levels (in this case gender and marital

status). The variables can be nominal (which is the case in this example) or ordinal,

with a few levels.

• One scale, dependent variable (in this case student’s height)

How to Produce the Selected SPSS Output To answer Problem 11.3 with Windows:

• Click on Analyze ⇒ General Linear Model ⇒ Univariate. • Highlight student height in inches and move it into the Dependent variable box • Highlight gender of student and marital status and move them into the Fixed Factor

box

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

• Click on Plots. This will open the Univariate: Profile Plots window • Highlight marital under Factors and move it into the Horizontal Axis box • Highlight gender under Factors and move it into the Separate Lines box • Click on Add • Click on Continue • Click on Options. This will open the Univariate: Options window • Select Descriptive statistics and Estimates of Effect size • Click on Continue and O.K How to answer Problem 11.3 with syntax: UNIANOVA height BY gender marital /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /PLOT = PROFILE( marital*gender ) /PRINT = DESCRIPTIVE ETASQ /CRITERIA = ALPHA(.05) /DESIGN = gender marital gender*marital .

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

SPSS Output for Problem 11.3

Univariate Analysis of Variance

Between-Subjects Factors

males 26 females 23 single 20 married 18

divorced 11

1 2

gender of student

1 2 3

marital status

Value Label N

Descriptive Statistics

Dependent Variable: student height in inches

69.7143 2.6437 14 72.1429 2.1931 7 69.0000 3.1623 5 70.2308 2.8044 26 64.6667 1.7512 6 63.9091 1.5783 11 63.5000 3.0166 6 64.0000 2.0226 23 68.2000 3.3498 20 67.1111 4.4969 18 66.0000 4.0988 11 67.3061 3.9802 49

marital status single married divorced Total single married divorced Total single married divorced Total

gender of student males

females

Total

Mean Std. Deviation N

This table has divided the observations into six groups; males who are single, males who are married, males who are divorced, females who are single, females who are married, and females who are divorced. There are also totals.

Two independent variables, gender and marital status.

Levels of the two independent variables

Number of observations in each level of each independent variable

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Tests of Between-Subjects Effects

Dependent Variable: student height in inches

514.951a 5 102.990 18.042 .000 .677 193618.932 1 193618.932 33918.869 .000 .999

420.666 1 420.666 73.694 .000 .632 21.230 2 10.615 1.860 .168 .080 24.268 2 12.134 2.126 .132 .090

245.457 43 5.708 222736.000 49

760.408 48

Source Corrected Model Intercept GENDER MARITAL GENDER * MARITAL Error Total Corrected Total

Type III Sum of Squares df Mean Square F Sig. Eta Squared

R Squared = .677 (Adjusted R Squared = .640)a.

Profile Plots

Estimated Marginal Means of student height in

marital status

divorcedmarriedsingle

Es tim

at ed

M ar

gi na

l M ea

gender of student

males

females

Many times it is easier to see the interaction (and what is really going on) by looking at the plot. These lines are approximately parallel, as indicated by the fact that the gender* marital interaction was not significant, F = 2.13, p = .132.

The F or ANOVA value

This indicates the amount of variance in the dependent variable that can be explained by each source. Eta is one measure of effect size.

Always check the interaction effect first, then the significance levels of the independent variables. Ignore the corrected model and intercept lines.

The p value

This indicates the percent of variance in the dependent variable predictable from both of the independent variables and the interaction.

Interaction

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Description of Output 11.3

The first table shows the number of students in each level of the two independent

variables. The second table shows the means, SD’s, and n’s for each of the six cells as

well as for total. The third table is the factorial ANOVA table. Note, from the

significance column, that the Sig. for the interaction (GENDER*MARITAL) is not

significant (p =.132). Gender is significant (p < .001) and eta2 for gender is .63. Thus,

sixty-three percent of the variance in student height in inches can be predicted from

gender. Eta (not squared) for gender is about .795. This is an effect size measure.

Because the interaction was not significant, the Profile Plot shows that the lines for males

and females are approximately parallel. You can also see that the line for males’ heights

is consistently higher than the one for females. This shows graphically what the ANOVA

(F) and eta for gender indicated. Males are significantly taller and there is a very large

effect of gender on height.

Example of APA Tables, Results and Discussion for Problem 11.3

Results

Table 11.3a shows the means and standard deviations for the levels of marital

status by gender. Table 11.1b shows that there was not a significant interaction between

gender and marital status on student height (p < .132). Looking at the main effect of

gender, there was a significant difference between the genders on student height in

inches, F (1,43) = 73.69, p < .001. Males were taller than females by approximately 6

inches in this sample (see Table 11.1a). Eta for gender was about .795, which according

to Cohen (1988) is a large effect.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Table 11.3a

Means, Standard Deviations, and n for Student Height as a Function of Gender and Marital Status Males Females Total Marital Status

Single 14 69.71 2.64 6 64.67 1.75 68.20 3.34 Married 7 72.14 2.19 11 63.91 1.58 67.11 4.50 Divorced 5 69.00 3.16 6 63.50 3.02 66.00 4.10 Total 26 70.23 2.8 23 64.00 2.02 67.31 3.98

Table 11.3b

Two-Way Analysis of Variance for Student Height as a Function of Gender and Marital Status Variable and source df MS F η2 Student Height Gender 1 420.67 73.69** .632 Marital 2 10.62 1.86 .080 Gender*Marital Error

2 43

12.13 5.71

2.13

.090

Note. η = .79 = effect size. **p < .001

Asterisk footnotes are used because p (Sig.) is not included in the table as a separate column.

The information in this table is from the ANOVA table in the SPSS output.

These are the average student heights for each gender group.

IBM SPSS for Introductory Statistics: Use and Interpretation, 5th Ed. (Morgan, Leech, Gloeckner & Barrett) Instructor's Manual by Gene W. Gloeckner and Don Quick

Discussion

As seen in previous literature, there is a relationship between gender and student

height, with males taller than females (Jones, 1999; Blake, 2000). This might be

explained by evolutionary tendencies for males to be taller than females. As seen from

other articles, it was not unexpected that there was no relationship between marital status

and student height and no interaction of marital and gender.

11.4. Do gender and having children interact and do either seem to affect current GPA? 11.5. Are there differences between the age groups in regard to the average number of

hours they a) study, b) work, and c) watch TV?

Selection of the Statistic
How to Produce the Selected SPSS Output
To answer Problem 11.3 with Windows:
SPSS Output for Problem 11.3

Profile Plots

Example of APA Tables, Results and Discussion for Problem 11.3
Table 11.3a

Means, Standard Deviations, and n for Student Height as a Function of Gender and
Marital Status

Males Females Total

Two-Way Analysis of Variance for Student Height as a Function of Gender and
Marital Status