Article/Data Analysis Exercise Assignment

profilefarra004
Chapter14740.pdf

Chapter 14:

Interpreting Data

1

© 2018 Cengage Learning. All Rights Reserved.

Learning Objectives

• Understand that descriptive statistics are used to summarize data under study

• Describe a frequency distribution in terms of cases, attributes, and variables

• Recognize that measures of central tendency summarize data, but they do not convey the detail of the original data

• Understand that measures of dispersion give a summary indication of the distribution of cases around an average value

• Provide examples of rates as descriptive statistics that standardize some measure for comparative purposes

2

© 2018 Cengage Learning. All Rights Reserved.

Learning Objectives, cont. • Describe how bivariate analysis and subgroup comparisons

examine relationships between two variables • Compute and interpret percentages in contingency tables • Understand that multivariate analysis examines the relationships

among several variables • Explain the logic underlying the proportionate reduction of error

(PRE) model • Describe the use of lambda (λ) and gamma (γ), and Pearson’s

product-moment correlation (r) as PRE-based measures of association for nominal, ordinal, and interval/ration variables, respectively

• Summarize how regression equations and regression lines are used in data analysis

3

© 2018 Cengage Learning. All Rights Reserved.

Learning Objectives, slide 3 • Understand how inferential statistics are used to estimate

the generalizability of findings arrived at in the analysis of a sample to a larger population

• Describe the meaning of confidence intervals and confidence levels in inferential statistics

• Explain what tests of statistical significance indicate, and how to interpret them

• Recognize the difference between statistical significance and substantive significance

• Understand that tests of statistical significance make assumptions about data and methods that are rarely satisfied completely in social science research

4

© 2018 Cengage Learning. All Rights Reserved.

Introduction • Empirical research usually uses some type of

statistical analysis • Mathematics: Language for accomplishing

logical operations inherent in good data analysis • Statistics: Branch of math appropriate to

research – Descriptive statistics: Method for describing data in manageable

forms – Inferential statistics: Assist in forming conclusions from our

observations • About a population, based on studying the sample

5

© 2018 Cengage Learning. All Rights Reserved.

Univariate Description • Univariate Analysis: Only one variable at a

time • Bivariate Analysis: Two variables • Multivariate Analysis: Three or more variables • Distributions: Reporting all individual cases

– Marginals: Frequency distributions of grouped data (age of students)

– Frequency Distribution: (2, 7, 11, 14, 16)

6

© 2018 Cengage Learning. All Rights Reserved.

Measures of Central Tendency

• “Summary Averages”

• Mode: Most frequent attribute

• Mean: Sum of all values divided by # of total values

• Median: Middle attribute of ranked data

7

© 2018 Cengage Learning. All Rights Reserved.

Measures of Dispersion & Computing Rates

• Range: Distance separating the highest value from the lowest value

• Standard Deviation: The average amount of variation about the mean

• Variance: Sum of squared standard deviations from mean divided by total number of cases

• Percentile: What percentage of cases fall at or below some value; can be grouped into quartiles

• Rates: Used to standardize some measure for comparative purposes

8

© 2018 Cengage Learning. All Rights Reserved.

Discussion Question 1

Which variables would you prefer to use in your research: continuous, discreet, or both?

9

© 2018 Cengage Learning. All Rights Reserved.

Discussion Question 2

What if you had to read a large number of studies as a part of your research? Do you think you would be more concerned about having enough detail or being able to manage the data easily?

10

© 2018 Cengage Learning. All Rights Reserved.

Bivariate Analysis • We are interested how variables are related

(explanation)

• Contingency table: Used to compare subgroups; “percentage down” column, read across row

• Values of the dependent variable are contingent on values of the independent variable

11

© 2018 Cengage Learning. All Rights Reserved.

Multivariate Analysis

• Instead of explaining the dependent variable on the basis of a single independent variable, seek an explanation through the use of more than one independent variable

12

© 2018 Cengage Learning. All Rights Reserved.

Measures of Association • Indicates strength of relationship (0≥1) • Based on Proportionate Reduction of Error (PRE):

– How much variation in y can be predicted by x; how much you can reduce your error in predicting y by knowing x

– The greater the relationship between two variables, the greater the reduction of error

13

© 2018 Cengage Learning. All Rights Reserved.

Levels of Measurement

• Nominal Variables: Gender, marital status, or race – Lambda (λ): Based on your ability to guess values on one of the

variables

• Ordinal Variables: Occupational status, education – Gamma (γ): Same as lambda, except based on the ordinal

arrangement of values

• Interval or Ratio Variables: Age, income – Pearson’s product-moment correlation (r)

14

© 2018 Cengage Learning. All Rights Reserved.

Regression Analysis

• Variables are linearly related: – The mean of Y increases linearly with X

– Check scatterplot for general linear trend

– Watch out for nonlinear relationships

• Y is normally distributed for every outcome of X in the population; “conditional normality” – Ex: Income = X, Happiness = Y

• Is a histogram of income approximately normal? For those with X = $25K? $50K? $100K? – If all are roughly normal, the assumption is met

15

© 2018 Cengage Learning. All Rights Reserved.

Regression Analysis, cont. • Association between two variables: Y = f (x) • Regression Line: All four points lie on a straight line;

we can superimpose that line over the points; Y' = a + b(x)

• “Unexplained Variation”: The sum of squared differences between actual and estimated values of Y – Represents errors that exist even when estimates are based on known

values of X

• “Explained Variation”: The difference between the total variation and the unexplained variation

16

© 2018 Cengage Learning. All Rights Reserved.

Inferential Statistics

• When we generalize from samples to larger populations, we use inferential statistics to test the significance of an observed relationship

• Data analysis & sampling – Most research projects involve samples – Ultimate purpose is to make inferences about that larger

(target) population – Both univariate and multivariate findings can be interpreted as

a basis for inference

17

© 2018 Cengage Learning. All Rights Reserved.

Univariate Inferences

• Univariate Measures: Percentages & Means

• Any statement of sampling error must contain two essential components: – Confidence Level – Confidence Interval

• Inferential statistics apply to sampling error only; they do not take account of nonsampling errors

18

© 2018 Cengage Learning. All Rights Reserved.

Tests of Statistical Significance

• So, two variables are related? Is the relationship a significant one?

– Parametric tests of significance can tell us

– We report probability that a parameter falls within a certain range (confidence interval) and that degree of uncertainty is due to normal sampling error

19

© 2018 Cengage Learning. All Rights Reserved.

Tests of Statistical Significance, cont.

• Statistical significance is expressed with probabilities

• What does the p-value mean? • Significance at .05 level means that

probability of achieving result by chance alone is 5 out of 100 (or 1 at the .01 level)

• If it’s not by chance, it represents a real finding between the variables!

20

© 2018 Cengage Learning. All Rights Reserved.

Discussion Question 3

What if someone insisted that they were 100 percent certain about their survey results? Could you challenge that statement? How?

21

© 2018 Cengage Learning. All Rights Reserved.

Chi Square • Based on the Null Hypothesis: the

assumption that there is no relationship between two variables in a population

• Compares what you get (empirical) with what you expect given a null hypothesis of no relationship

• Computing: For each cell in the tables, we – Subtract the expected frequency for that cell from the

observed frequency – Square this quantity, and – Divide the squared difference by the expected frequency

22

© 2018 Cengage Learning. All Rights Reserved.

Interpreting Statistical Significance

• Significance tests are guideline, not ultimate standard

• Dangers due to sampling error, sample size, etc.

• Check and compare to other tests • "Empirical research is, first and foremost,

a logical rather than a mathematical operation."

23

© 2018 Cengage Learning. All Rights Reserved.

Visualizing Discernible Differences

• What is a statistically discernible difference? – Results from tests on a nonrandom sample would

be considered statistically significant if found in a random sample

– Findings should be viewed as important but not statistically significant

24