assignment 1

profilecrisalza09
1.Lecture1.docx

Lecture One

· Statistics:

· Descriptive:

· What? Distribution of one variable (i.e., its center and dispersion); joint distribution of two variables (i.e., their relationship).

· How? Tabular, graphic, and numerical.

· Inferential:

· (Probability) sample population (of interest)

Sample statistic population parameter

· How? Confidence interval, hypothesis testing.

· Variable: things that vary. That is, different cases have different values. [draw data structure]

· Unit of analysis: what the statement/variable is about. To see why this matters, consider the following hypothetical example:

· Are female applicants less likely to get admitted than male applicants at the department level?

· Are female applicants less likely to get admitted than male applicants at the college level?

· Scale of measurement:

· Variables are always coded using numeric values, but values have different functions.

· Nominal: values are used for differentiation only. Race, gender, marital status, etc.

· Ordinal: ranking order of values is meaningful. Life satisfaction, level of support, etc.

· Interval: distance between values is meaningful. HDI, IQ, temperature, etc.

· Ratio: true zero, i.e., 0 means “none”. Income in dollars, number of children, etc.

· Usually, we call both interval and ratio variables continuous variables.

· Why important? It determines what technique to use.

· [optional] Reliability and validity issues:

· Reliability: consistency across repeated measurements.

· Validity:

· Construct validity: are we measuring what we want to measure?

· Internal validity: causality = association + temporal order + lack of spuriousness.

· External validity: generalizability.

· Reliability and construct validity are criteria for measurement evaluations.

· Internal and external validities are criteria used to evaluate research designs.

· There are more kinds of validities, which mean different things to different researchers.

· More on variable:

· Why variable? Population heterogeneity == the fundamental truth of social science.

· Essentialism vs. Population thinking: is variation REAL?

· Data structure: row == case, column == variable

· What method to use: depends on the scale of measurement/type of the variable(s)

· What it is about: unit of analysis

· Variables in relationship: independent variable (a.k.a. predictor, explanatory variable, exogenous variable) dependent variable (a.k.a. response, outcome, endogenous variable)

· Univariate description:

· The choice of methods (mainly) depends on the level of measurement.

· Frequency Table: title, frequency count ( f), relative frequency ( p), relative cumulative frequency

· Bar Chart: title, axes, labels, bars, (spacing b/w bars)

· Histogram: title, axes, labels, bars, (no spacing b/w bins), (class interval/bin width), (skewness)

· Contingency Table: cross-classification, cell counts, marginal totals, row/column percentages.

· To be continued…

· Bivariate relationship:

· Association: X is associated/correlated with Y.

· Influence: X has an impact on Y.

· Causality: X causes Y.

· (Randomized Controlled) Experiment: random assignment

· Observational Study: association/influence + temporal order + no spuriousness

· Important implication: association or influence ≠ causality

· A third variable Z might complicate the observed bivariate relationship b/w X and Y:

· Spuriousness: the observed XY relationship is due to a common cause Z.

· Interaction/specification/ modification: the observed XY relationship differs toward different groups of Z.

· Mediation/mechanism/intervening: the observed XY relationship mediates through Z.

Practice Problems

1. Identify the level of measurement for each of the following variables:

(A) Type of residence (i.e., dorm, off-campus apartment, condominium, or parent’s home, other)

(B) Height in inches

(C) The rating of the overall quality of a textbook on a scale from "Excellent" to "Poor"

(D) Lab section

(E) Level of measurement

2. For each of the following research questions, identify the unit of analysis, the independent variable, and the dependent variable:

(A) Are social movement participation rates higher in countries that have a longer history of democratic rule?

(B) Do school districts with more highly educated teachers tend to have higher standardized test scores among the students in the district?

3. Assume that the answers to the research questions in Problem 2 are both yes. Consider the following additional variables that go with the above questions:

(A) median household income of the country

(B) median household income of the school district

For each research question, decide whether the additional variable is most likely to be a source of spuriousness, a possible mechanism, or a possible modifier of the implied relationships in question 2. Explain each answer in one or two sentences and draw a diagram showing how each additional variable is related to the independent and dependent variables from each part of Question 2.

[Note that mechanism and mediator and intervenor are synonymous and that interaction, specification, and modification are synonyms.]

4. Among many other things, the United Nations Development Program collects data from various countries on the percentage of the adult workforce made up by women. Here are the percentages for 25 countries included in a study done by the program in the mid-1990's: 37; 39; 46; 39; 46; 42; 39; 27; 48; 39; 42; 42; 45; 45; 42; 39; 30; 47; 42; 30; 37; 42; 46; 42; 42.

Show your work for each of the following (for repetitive calculations, such as relative frequency, you only have to show work for the first few repetitions):

(A) Determine absolute and relative frequencies (i.e., proportions) of each value as well as the cumulative relative frequencies at each point. Do this by filling out a table that follows the following format:

% women in the workforce ( X)

Frequency

( f)

Relative Frequency ( p)

Relative Cumulative Frequency ( p)

27

1

4

4

30

2

8

12

37

2

8

20

39

5

20

40

42

8

32

72

45

2

8

80

46

3

12

92

47

1

4

96

48

1

4

100

(Note: you should include as many rows as you need to display frequencies for each data value). Do not group the data into class intervals. Instead, use one row for each value present in the dataset.

(B) Plot your data on a frequency histogram. Now you will need to group your data into class intervals. Describe how you determine the intervals.

(C) Describe the shape of your frequency histogram.

image1.emf

Microsoft_Excel_Worksheet.xlsx

Sheet1

Example 1-1: Department vs. College
Male Female
# of applicants % admitted # of applicants % admitted
College 2175 47 0 849 31
Dept A 825 62 108 82 511.5 88.56
Dept B 560 63 25 68 352.8 17
Dept C 417 33 375 35 137.61 131.25
Dept D 373 6 341 7 22.38 23.87
1024.29 260.68

Sheet2

Sheet3

image2.emf

Microsoft_Excel_Worksheet1.xlsx

Sheet1

THE BIG PICTURE
Tabular Graphic Numerical
Univariate Discrete Nominal Frequency Table Bar Chart Mode
Ordinal Mode, Median, Mean
Continuous Interval Histogram
Ratio
Bivariate Discrete-Discrete Contingency Table TBC TBC

Sheet2

Sheet3