Article/Data Analysis Exercise Assignment
Chapter 14:
Interpreting Data
1
© 2018 Cengage Learning. All Rights Reserved.
Learning Objectives
• Understand that descriptive statistics are used to summarize data under study
• Describe a frequency distribution in terms of cases, attributes, and variables
• Recognize that measures of central tendency summarize data, but they do not convey the detail of the original data
• Understand that measures of dispersion give a summary indication of the distribution of cases around an average value
• Provide examples of rates as descriptive statistics that standardize some measure for comparative purposes
2
© 2018 Cengage Learning. All Rights Reserved.
Learning Objectives, cont. • Describe how bivariate analysis and subgroup comparisons
examine relationships between two variables • Compute and interpret percentages in contingency tables • Understand that multivariate analysis examines the relationships
among several variables • Explain the logic underlying the proportionate reduction of error
(PRE) model • Describe the use of lambda (λ) and gamma (γ), and Pearson’s
product-moment correlation (r) as PRE-based measures of association for nominal, ordinal, and interval/ration variables, respectively
• Summarize how regression equations and regression lines are used in data analysis
3
© 2018 Cengage Learning. All Rights Reserved.
Learning Objectives, slide 3 • Understand how inferential statistics are used to estimate
the generalizability of findings arrived at in the analysis of a sample to a larger population
• Describe the meaning of confidence intervals and confidence levels in inferential statistics
• Explain what tests of statistical significance indicate, and how to interpret them
• Recognize the difference between statistical significance and substantive significance
• Understand that tests of statistical significance make assumptions about data and methods that are rarely satisfied completely in social science research
4
© 2018 Cengage Learning. All Rights Reserved.
Introduction • Empirical research usually uses some type of
statistical analysis • Mathematics: Language for accomplishing
logical operations inherent in good data analysis • Statistics: Branch of math appropriate to
research – Descriptive statistics: Method for describing data in manageable
forms – Inferential statistics: Assist in forming conclusions from our
observations • About a population, based on studying the sample
5
© 2018 Cengage Learning. All Rights Reserved.
Univariate Description • Univariate Analysis: Only one variable at a
time • Bivariate Analysis: Two variables • Multivariate Analysis: Three or more variables • Distributions: Reporting all individual cases
– Marginals: Frequency distributions of grouped data (age of students)
– Frequency Distribution: (2, 7, 11, 14, 16)
6
© 2018 Cengage Learning. All Rights Reserved.
Measures of Central Tendency
• “Summary Averages”
• Mode: Most frequent attribute
• Mean: Sum of all values divided by # of total values
• Median: Middle attribute of ranked data
7
© 2018 Cengage Learning. All Rights Reserved.
Measures of Dispersion & Computing Rates
• Range: Distance separating the highest value from the lowest value
• Standard Deviation: The average amount of variation about the mean
• Variance: Sum of squared standard deviations from mean divided by total number of cases
• Percentile: What percentage of cases fall at or below some value; can be grouped into quartiles
• Rates: Used to standardize some measure for comparative purposes
8
© 2018 Cengage Learning. All Rights Reserved.
Discussion Question 1
Which variables would you prefer to use in your research: continuous, discreet, or both?
9
© 2018 Cengage Learning. All Rights Reserved.
Discussion Question 2
What if you had to read a large number of studies as a part of your research? Do you think you would be more concerned about having enough detail or being able to manage the data easily?
10
© 2018 Cengage Learning. All Rights Reserved.
Bivariate Analysis • We are interested how variables are related
(explanation)
• Contingency table: Used to compare subgroups; “percentage down” column, read across row
• Values of the dependent variable are contingent on values of the independent variable
11
© 2018 Cengage Learning. All Rights Reserved.
Multivariate Analysis
• Instead of explaining the dependent variable on the basis of a single independent variable, seek an explanation through the use of more than one independent variable
12
© 2018 Cengage Learning. All Rights Reserved.
Measures of Association • Indicates strength of relationship (0≥1) • Based on Proportionate Reduction of Error (PRE):
– How much variation in y can be predicted by x; how much you can reduce your error in predicting y by knowing x
– The greater the relationship between two variables, the greater the reduction of error
13
© 2018 Cengage Learning. All Rights Reserved.
Levels of Measurement
• Nominal Variables: Gender, marital status, or race – Lambda (λ): Based on your ability to guess values on one of the
variables
• Ordinal Variables: Occupational status, education – Gamma (γ): Same as lambda, except based on the ordinal
arrangement of values
• Interval or Ratio Variables: Age, income – Pearson’s product-moment correlation (r)
14
© 2018 Cengage Learning. All Rights Reserved.
Regression Analysis
• Variables are linearly related: – The mean of Y increases linearly with X
– Check scatterplot for general linear trend
– Watch out for nonlinear relationships
• Y is normally distributed for every outcome of X in the population; “conditional normality” – Ex: Income = X, Happiness = Y
• Is a histogram of income approximately normal? For those with X = $25K? $50K? $100K? – If all are roughly normal, the assumption is met
15
© 2018 Cengage Learning. All Rights Reserved.
Regression Analysis, cont. • Association between two variables: Y = f (x) • Regression Line: All four points lie on a straight line;
we can superimpose that line over the points; Y' = a + b(x)
• “Unexplained Variation”: The sum of squared differences between actual and estimated values of Y – Represents errors that exist even when estimates are based on known
values of X
• “Explained Variation”: The difference between the total variation and the unexplained variation
16
© 2018 Cengage Learning. All Rights Reserved.
Inferential Statistics
• When we generalize from samples to larger populations, we use inferential statistics to test the significance of an observed relationship
• Data analysis & sampling – Most research projects involve samples – Ultimate purpose is to make inferences about that larger
(target) population – Both univariate and multivariate findings can be interpreted as
a basis for inference
17
© 2018 Cengage Learning. All Rights Reserved.
Univariate Inferences
• Univariate Measures: Percentages & Means
• Any statement of sampling error must contain two essential components: – Confidence Level – Confidence Interval
• Inferential statistics apply to sampling error only; they do not take account of nonsampling errors
18
© 2018 Cengage Learning. All Rights Reserved.
Tests of Statistical Significance
• So, two variables are related? Is the relationship a significant one?
– Parametric tests of significance can tell us
– We report probability that a parameter falls within a certain range (confidence interval) and that degree of uncertainty is due to normal sampling error
19
© 2018 Cengage Learning. All Rights Reserved.
Tests of Statistical Significance, cont.
• Statistical significance is expressed with probabilities
• What does the p-value mean? • Significance at .05 level means that
probability of achieving result by chance alone is 5 out of 100 (or 1 at the .01 level)
• If it’s not by chance, it represents a real finding between the variables!
20
© 2018 Cengage Learning. All Rights Reserved.
Discussion Question 3
What if someone insisted that they were 100 percent certain about their survey results? Could you challenge that statement? How?
21
© 2018 Cengage Learning. All Rights Reserved.
Chi Square • Based on the Null Hypothesis: the
assumption that there is no relationship between two variables in a population
• Compares what you get (empirical) with what you expect given a null hypothesis of no relationship
• Computing: For each cell in the tables, we – Subtract the expected frequency for that cell from the
observed frequency – Square this quantity, and – Divide the squared difference by the expected frequency
22
© 2018 Cengage Learning. All Rights Reserved.
Interpreting Statistical Significance
• Significance tests are guideline, not ultimate standard
• Dangers due to sampling error, sample size, etc.
• Check and compare to other tests • "Empirical research is, first and foremost,
a logical rather than a mathematical operation."
23
© 2018 Cengage Learning. All Rights Reserved.
Visualizing Discernible Differences
• What is a statistically discernible difference? – Results from tests on a nonrandom sample would
be considered statistically significant if found in a random sample
– Findings should be viewed as important but not statistically significant
24