Statistics Professional Needed IMMEDIATELY

profilemalibumark21
templateforweek2assignmentrev1.doc

Running head: WEEK 2 DATA ANALYSIS AND APPLICATION 1

WEEK 2 DATA ANALYSIS AND APPLICATION 6

Assignment 2

Week 2 Data Analysis and Application

(This example introduction can give you an idea of what I’m looking for to introduce each assignment. You CAN use exactly this introduction for the week 2 assignment, or you can write your own). Use other highlighted sections from this example to provide you with information needed to complete the assignment. You can paraphrase those highlighted sections to introduce or explain your responses to those questions but do not use the exact wording.

The purpose of this assignment is to explore fundamental concepts from prior statistics courses. Analyses include a compare/contrast of descriptive statistics versus inferential statistics, parametric statistics versus nonparametric statistics, and internal validity versus external validity. The assignment also explores the limitations of a convenience sample, the meaning and calculation of the Sum of Squares (SS) statistic, the utility of z-scores, the conceptual meaning of the SEm statistic, the use of the t distribution, and analysis of appropriate and inappropriate variables for use in a Pearson correlation.

1) Descriptive versus Inferential Statistics

2) Parametric versus Nonparametric Statistics

Refer to Table 1.3 on page 24 of the Warner (2013) textbook.

3) Internal versus External Validity

4) Convenience Samples

5) Value of SS

Note: To answer this question, you must understand what SS represents. Broadly speaking, all of the inferential statistics we study in this course can be conceptualized in terms of a “signal” to “noise” ratio:

Inferential statistic = image2.png

Typically, our “signal” is a measure of centrality like a mean score (M) on some variable Y. The strength of this signal is divided by the amount of “noise,” or variability, in variable Y. The “Sum of Squared Deviations” (SS) statistic is a measure of dispersion (variability; noise) of scores around a mean (M). A large SS represents a large amount of variability or noise in variable Y; a small SS represents a small amount of variability or noise in variable Y. We will see applications of SS repeatedly in this course, so you must gain an intuitive understanding of what it represents. The SS mathematical formula is expressed as:

SS = Σ (X – M)2

Think of mathematical equations as sentences to be translated. This formula says,

“The Sum of Squared Deviations (SS) is equal to the sum of all (Σ) squared scores (X1, X2…Xn) subtracted from the mean (M).” That statement requires us to subtract the mean (i.e., our signal) from each individual score X to calculate a deviation score, then calculate the square of that deviation score, and then add up all of the squared deviation scores together to calculate SS (i.e., our noise). Again, the larger the SS value, the larger the amount of variability is represented in a sample set of data.

6) z Scores and Proportion of the Population

This question tests your understanding of the normal distribution, the proportion of area under the curve, and the calculation of z-scores, which should have been reviewed in previous statistics courses:

z = (X - µ)/ σ

6a. What proportion of people would be expected to score below 20?

z = (X – 30)/ 10 = (20-30)/ 10 = -1.0

The area to the left of z = -1.0 (i.e., “below 20”) is .1587. We can see this by referring to Figure 1.4 on page 13 of Warner (2013). “Below 20” refers to the area under the curve from z = -∞ to -1, or .0013+.0214+.1359 = .1587. You may also refer to Appendix A at the back of the Warner (2013) textbook. Locate z = 1.00 (refer to the –z figures at the bottom of page 977). The area under the curve below z = -1.0 is .1587. That is, about 15 to 16 percent of the population would be expected to have anxiety scores below 20.

6b. What proportion of people would be expected to score above 30?

6c. What proportion of people would be expected to score between 10 and 50?

This question is asking us for the proportion of scores falling between z = -2 and z = 2.

7) SEM

7a. As s increases, how does the size of SEM change (assuming that N stays the same)?

7b. As N increases, how does the size of SEM change (assuming the s stays the same)?

8) t Distribution

9) Good versus Poor Variables for Correlation

To answer Question #9, we begin by reviewing the assumptions of correlation. A violation of assumptions leads to the inference that a variable is a “bad” candidate for correlation.

Assumptions of correlation. Let us limit our analysis to the Pearson correlation; the assumptions are as follows:

1) A given variable (X, Y) is normally distributed. (explain what this means; also consider outliers and what to do about them)

2) Both X and Y must be on the interval/ratio scale of measurement.

3) Variables X and Y are linearly related. (explain what this means)

4) Determining “good” variables. A “good” variable will not seriously violate the assumptions above. How would I determine two “good” variables in this data set? My first step would be to view all of the relevant descriptive statistics for all eight variables in the sample, which I would then confirm with an analysis of histograms (or you could run histograms first, and then confirm with descriptive statistics; both are necessary).

(Insert Descriptive Statistics here)

(Insert histograms here)

(Describe what you found and what it means- from this you should be able to determine one “good” and one “bad” variable. Explain this)

(Confirm with scatterplots)

Insert Scatterplot of “good” variables. Describe what it means.

Insert Scatterplot of “bad” variables. Describe what it means.

If you wish, cacluate Pearson correlation for both the “good” and “bad” variables.

What other information would I want?

10) Improving the Poor Candidate

References