HW1
This entire assignment is due before 11:59 pm on Monday, Sept. 13. Submit your work as a single zip file. While you can discuss homework among yourself, the work must be individual. There must be no sharing of code.
PART I
We will explore a basic issue that arises often in analytics, namely whether to use a one-sided vs. two-sided student’s t-test, and also whether the test should be paired. You may want to review Student’s t-test either from your old statistics text or from Wikipedia (it is accurate in this case). You will be using the data below. Assume that ‘statistical significance’ means a confidence level of 95%. To do the analysis in the questions, you may use any tool you wish (I will not ask for evidence), including excel, Python, R etc. I do recommend that you use a tool because it would not do to lose points on manual calculation errors.
My primary goal in giving you this exercise is to reinforce the importance of basic statistics and significance testing. Feel free to look up any and every resource on the Web to brush up on what you need to know. We’re aiming for conceptual clarity and application, not memorization.
|
Test subject (‘student’) |
GPA (Fall 2018) |
GPA (Spring 2019) |
|
1 2 3 4 5 6 7 8 9 10 |
3.44481587332 3.40753716919 3.67967040671 3.49235971237 3.35806029563 3.59876412408 3.20857956506 3.49077194424 3.4864916754 3.39281679695 |
3.63235695644 3.2735320054 3.47597264719 3.45727477966 3.20981902735 3.56866681634 3.15388226146 3.56383533036 3.62542408166 3.00151409103 |
1. [10 points] The Dean wants to know if the average GPA of the students is at least 3.5, the default in previous semesters being that it was below 3.5. Using g to represent the sample GPA mean for fall and G to represent the population GPA mean, write down the Dean’s null (H0) and alternate (Ha) hypothesis.
2. [10 points] Assuming the population variance of the two semesters above are equal (but unknown), individually conduct the Student’s t-test for each semester. In either semester can you (individually) reject the null hypothesis?
3. [15 points] You are asked to determine if the average GPA has changed over the two semesters. You decide to conduct a paired t-test, assuming (as we will see, unwisely) that the same test subjects were sampled for their GPAs in both fall and spring. What do you find? Is the difference between the average GPAs statistically significant?
4. [15 points] In a conversation with the person who sampled the data, you now discover that the ten subjects in the fall semester are not the same as the subjects in the spring semester (however, you are told that you can still make the equal variance assumption about both semesters). Should you still go with the results above, or conduct a different t-test? If you do, what do you now find?
PART II
See hw-1.py