Statistics
WEEK 5 HOMEWORK 5
THIS WEEK INVOLVES READING NEW TABLES, THE t-TABLES. BUT, UNLIKE THE Z-TABLES WHERE YOU JUST NEEDED TO CALCULATE A Z-VALUE TO FIND THE PROBABILITY IN THE TABLE, WITH THE t-TABLE YOU NEED THE t-VALUE AND THE DEGREES OF FREEDOM, WHICH IS SIMPLY N – 1. THE t-VALUES DEPEND ON THE SIZE OF THE SAMPLE. THE t-DISTRIBUTION SHAPE CAN BE A “SKINNY” BELL CURVE, SYMETRICAL JUST LIKE THE NORMAL DISTRIBTION; BUT, AS YOU CAN SEE, THE LARGER THE SAMPLE SIZE THE MORE THE t-DISTRIBUTION SHAPE LOOKS LIKE THE NORMAL DISTRIBUTION SHAPE.
THE t-TABLES ARE IN OUR COURSE CONTENT > COURSE RESOURCES (NEAR THE BOTTOM OF CONTENT) > STATISTICAL RESOURCES. I SUGGEST YOU PRINT IT OUT AND REVIEW IT AS YOU READ THESE INSTRUCTIONS:
LET’S SAY YOU WANT TO BE 99% CONFIDENT (LEVEL OF CONFIDENCE) IN YOUR CONFIDENCE INTERVAL. THIS LEAVES A 1% CHANCE YOUR SAMPLE STATISTIC LIKE THE MEAN IS NOT IN THAT INTERVAL. KEEP IN MIND THAT THE CI HAS AN UPPER LIMIT AND A LOWER LIMIT SO THIS 1% COVERS BOTH ENDS. THIS MEANS THAT THE PROBABILITY THAT OUR RESULT IS IN ONE OR THE OTHER TAIL IS 1% / 2 = 0.5% (OR 0.005). THINK ABOUT IT. NOTE THAT IN THE TABLE THE 99% CI (BOTTOM LINE) EQUATES TO 0.005 ON THE TOP LINE. ALL WE NEED NOW IS THE DEGREES OF FREEDOM, WHICH IS N – 1, TO GET OUR CRITICAL t-VALUE FROM THE BODY OF THE TABLE. IN THIS EXAMPLE THE df IS 14, SO THE t-VALUE IS 2.977. THIS t-VALUE IS A CRITICAL VALUE LIKE A Z-VALUE.
THE t-TABLES ARE DIFFERENT FROM THE z-TABLES IN THAT WITH THE “t” WE SPECIFY THE PROBABILITY WE WANT AND THEN FIND THE CRITICAL t-VALUE. WITH THE NORMAL DISTRIBUTION, WE CALCULTE THE z-VALUE FIRST AND THEN COMPARE IT TO THE CRITICAL z-VALUE OF OUR DESIRED LEVEL OF SIGNIFICANCE (90%, 95%, OR 99%) TO SEE IF OUR CALCULATED z-VALUE IS FURTHER OUT IN EITHER TAIL, HENCE “UNUSUAL”.
THE NORMAL DISTRIBUTION CURVE IS ONE CURVE AND DOES NOT CHANGE WITH SAMPLE SIZE, HENCE WE CAN HAVE THE SAME CRITICAL z-VALUES FOR EVERY PROBLEM (99%, 95%, AND 90%). THE t-DISTRIBTUTION CURVE STARTS OUT A SKINNY BELL CURVE AND SLOWLY WIDENS AS THE SAMPLE SIZE (THE df) INCREASES. YOU CAN SEE FROM THE t-TABLE THAT WHEN THE df APPROACHES 1000, WE ARE VERY CLOSE TO THE NORMAL DISTRIBUTION’S BELL SHAPE. BUT, SINCE THE t-SHAPE CHANGES, THERE ARE DIFFERENT CRITICAL t-VALUES DEPENDING ON THE df (SAMPLE SIZES) INVOLVED.
HERE IS WHAT THE CONFIDENCE INTERVAL MEANS GRAPHICALLY: IN THIS SAMPLE PROBLEM WE WANTED A 95% CONFIDENCE LEVEL CONFIDENCE INTERVAL BASED ON A SAMPLE WITH A MEAN OF 3.26, A STD DEV OF 1.019 AND A SAMPLE SIZE OF 39 (df = 39-1 =38). THE CRITICAL t-VALUE FROM THE TABLE IS 2.023 (WE DON’T HAVE 39 df IN THE TABLE SO WE APPROXIMATED USING THE df FOR 39 AND 40). USE THE TABLE AND NOT SOFTWARE FOR THESE PROBLEMS. NOTE THAT THE 95% LEVEL MEANS 2.5% OF THE AREA UNDER THE t-CURVE IS IN EACH TAIL AS THE LIMITS.
NOW WE HAVE OUR t-VALUE = 2.023. REMEMBER WHEN WE STANDARDIZED OUR RAW DATA POINTS (THE X-VALUES) USING: Z = (X –MEAN)/STD DEV. IF WE WERE GIVEN THE z-VALUE WE COULD BACK-CALCULATE THE X-VALUE OR ACTUAL DATA POINT WITH THAT STANDARD DEVIATION (THE z-VALUE): X = Z * STD DEV + MEAN.
WITH THE t-VALUE IT’S A LITTLE DIFFERENT SINCE SAMPLE SIZE MATTERS. HERE WE MUST FIRST CALCULATE AN “ERROR BOUND” WHICH EQUALS: t-VALUE * STD DEV / SQ RT OF THE SAMPLE SIZE N.
PLUGGING IN WE HAVE: EBM = 2.023 * 1.019 / √39 = 0.3302. SIMPLY ADD AND SUBTRACT 0.3302 FROM THE SAMPLE MEAN TO GET THE CONFIDENCE INTERVAL LIMITS IN ACTUAL X-VALUES.
STATING WHAT THE CONFIDENCE NTERVAL MEANS IS DIFFERENT THAN YOU MIGHT FIRST THINK.
“WE CAN BE CONFIDENT THAT 95% OF OUR SAMPLE MEANS WILL FALL IN THIS CONFIDENCE INTERVAL.” AND NOT SIMPLY: “95% OF OUR SAMPLE MEANS WILL FALL IN THIS INTERVAL”. BE CAREFUL.
HERE ARE THE WEEK 5 HOMEWORK PROBLEMS.
LANE-CHAPTER 10 TYPE PROBLEMS.
#1. BRIEFLY EXPLAIN THE RELATIONSHIPS AMONG SAMPLES, POPULATIONS, STATISTICS, AND PARAMETERS.
#2. HOW ARE BIAS AND ACCURACY RELATED ?
#3. WHY DOES A CONFIDENCE INTERVAL GET WIDER THE MORE CONFIDENT YOU WANT TO BE ?
#4. THE STUDENT t DISTRIBUTION WITH ITS t- TABLES IS INTRODUCED THIS WEEK. WE HAVE PREVIOUSLY BEEN USING THE Z-TABLES FOR THE NORMAL DISTRIBUTION. HOW DO WE KNOW WHICH TO USE ?
#5. IN THE CONCEPTS AND RELATED PROBLEMS THIS WEEK WE INTRODUCE CONFIDENCE INTERVALS AND THE “STANDARD ERROR”. WE USE THIS “SE” TO DETERMINE THE LIMITS OF OUR CONFIDENCE INTERVAL. OF COURSE THE WIDTH OF THE “CI” ALSO DEPENDS ON JUST HOW “CONFIDENT” WE WANT TO BE: 90%, 95% OR 99%. AND, YOU EXPLAINED ABOVE WHY THE CI GETS WIDER AS OUR REQUIRED CONFIDENCE LEVEL INCREASES.
SO, WHAT IS THE STANDARD ERROR (NOT THE FORMULA) AND WHAT DOES IT MEASURE? ARE WE STILL JUST MEASURING DISTANCES FROM SOME LINE AND SQUARING THEM TO GET RID OF NEGATIVES? IS THERE A SQUARE ROOT INVOLVED? WHAT DO YOU THINK?
#6. YOU ALL HAVE ENJOYED DOING THE COMPLEX CALCULATIONS RELATED TO BINOMIAL PROBLEMS ;-) THE BINOMIAL PROBLEMS DEALT WITH DISCRETE DATA LIKE “HEADS/TAILS” OR “WIN/LOSE” AND YOU HAD TO USE THE COMPLEX EQUATION TO FIND THE PROBABILITY OF EACH SITUATION LIKE TOSSING 4 OR 5 HEADS OUT OF 5 TOSSES. YOU HAVE ALSO USED FORMULAS THAT ALLOW US TO TREAT DISCRETE DATA AS “CONTINUOUS” DATA (NORMAL APPROXIMATION TO THE BINOMIAL).
A ‘WRINKLE” FOR CONFIDENCE INTERVALS IS THAT THERE IS A “CORRECTION FACTOR” THAT MUST BE APPLIED (CHECK LANE AROUND PAGE 360) UNDER THE SECTION ENTITLED “PROPORTIONS” FOR THAT CORRECTION FORMULA). PROPORTIONS MEANS PERCENTAGES LIKE 35 OUT OF 100 = 35%.
THIS IS A PROBLEM THAT REQUIRES USING THAT CORRECTION FACTOR TO SOLVE A DISCRETE DATA PROBLEM AS A CONTINUOUS DATA PROBLEM (SOLVE IT):
A person claims to be able to predict the outcome of flipping a coin. This person is correct 18/24 times. Compute the 95% confidence interval on the proportion of times this person can predict coin flips correctly. What conclusion can you draw about this test of his ability to predict the future (at least regarding coin toss results)?
#7. THIS PROBLEM HAS TWO PARTS: THE FIRST USES THE Z-TABLES FOR THE NORMAL DISTRIBUTION AND THE SECOND REQUIRES THE t-TABLES. DO YOU UNDERSTAND WHY (THIS IS VERY IMORTANT)
You take a sample of 30 from a large population of test scores, and the mean of your sample is 70.
(a) You know that the standard deviation of the population is 12. What is the 99% confidence interval on the population mean.
(b) Now assume that you do NOT know the population standard deviation, but the standard deviation of your sample is 12. What is the 99% confidence interval on the mean now?
#8. WE ARE DIFINITELY INTO “INFERENTIAL” STATISTICS NOW AND ENTERING INTO THE MOST IMPORTANT TOPIC WE COVER: HYPOTHESIS TESTING. IN THIS PROBLEM WE USE THE CONFIDENCE INTERVAL TO ACTUALLY TEST A HYPOTHESIS OR “GUT FEELING” THAT WE HAVE.
You read about a survey in a newspaper and find that 75% of the 275 people sampled prefer Candidate A. You are surprised by this survey because you thought that only 50% of the population would prefer this candidate. Based on this sample, is 50% a possible population proportion? Compute the 95% confidence interval to be sure and decide for yourself whether based on this sample, 50% is a possible population proportion (proportion simply refers to percentage).
|
AN HYPOTHESIS IS KIND OF AN EDUCATED GUESS AND IN HYPOTHESIS TESTING THERE ARE ALWAYS TWO HYPOTHESES: THE NULL HYPOTHESIS (Ho) WHICH MUST ALWAYS HAVE THE EQUALS IN IT AND THE ALTERNATE HYPOTHESIS (Ha) WHICH MUST ACCOUNT FOR THE REMAINING POSSIBILITIES. HERE OUR Ho IS THAT THE PROPORTION EQUALS (OR IS GREATER THAN) 75% AND THE ALTERNATE HYPOTHESIS Ha IS THAT THE PROPORTION IS LESS THAN 75% (LIKE 50% WOULD BE). |
|
AS YOU WILL LEARN IN THE UPCOMING WEEKS, WE COVER THREE WAYS TO PERFORM AN HYPOTHESIS TEST: CONFIDENCE INTERVALS, TEST STATISTICS, AND PROBABILITY VALUES (P-VALUES). THE LATTER TWO ALWAYS LEAD TO THE SAME CONCLUSION, BUT CONFIDENCE INTERVALS MAY NOT AGREE.
AFTER WE DO OUR ANALYSIS WE EITHER “ACCEPT” OR “REJECT” Ho. (YOU SHOULD USE THOSE TERMS TO BE CLEAR). BE CAREFUL THOUGH, REJECTING Ho DOES NOT AUTOMATICALLY MEAN WE ACCEPT Ha. THERE MAY BE OTHER WRINKLES. |
|
|
ILLOWSKY CHAPTER 8
#9. BELOW IS A TABLE SHOWING THE NUMBER OF ZIKA MOSQUITOES FOUND IN 5 NEIGHBORHOODS IN A SMALL CITY. WE WANT TO CALCULATE THE CONFIDENCE INTERVAL AROUND THE TRUE MEAN AT A CONFIDENCE LEVEL OF 99%.
|
# SAMPLES |
MOSQ / SAMPLE |
TOTAL MOSQ |
|
2 |
1 |
2 |
|
3 |
2 |
6 |
|
4 |
3 |
12 |
|
5 |
4 |
20 |
|
1 |
5 |
5 |
|
15 |
|
45 |
THE TRICKY PART HERE IS TO GET THE VARIANCE (AND STD DEV) NOTE THAT THERE ARE 15 SAMPLES AND YOU ARE GIVEN THE NUMBER OF MOSQUITOES IN EACH OF THEM. TRY THIS TABLE:
|
SAMPLE # |
# MOSQ IN SAMPLE |
# MOSQ - MEAN |
(# MOSQ - MEAN)^2 |
|
1 |
1 |
-2.00 |
4.00 |
|
2 |
1 |
|
|
|
3 |
2 |
|
|
|
4 |
2 |
|
|
|
5 |
2 |
|
|
|
6 |
3 |
|
|
|
7 |
3 |
|
|
|
8 |
3 |
|
|
|
9 |
3 |
|
|
|
10 |
4 |
|
|
|
11 |
4 |
|
|
|
12 |
4 |
|
|
|
13 |
4 |
|
|
|
14 |
4 |
|
|
|
15 |
5 |
|
|
|
TOTAL MOSQ |
|
TOTAL |
|
MEAN = _______________, VARIANCE = ___________________, STANDARD DEVIATION = _______________
CRITICAL t- VALUE IS ____________, THE EBM IS _________ AND THE 99% CI IS __________________________
#10. AD AGENCIES are interested in knowing the population percent (PROPORTION) of women who make the majority of CAR BUYING decisions. If you were designing this study to determine this population proportion, what is the minimum number (N) you would need to survey to be 95% confident that the population proportion is estimated to within 5% (0.05)? THIS IS A z-VALUE PROPORTION (PERCENTAGE) PROBLEM (NOT t-VALUE)
b. WHAT IS THE FORMULA WE USE TO FIND “n” ?
c. WHAT ARE THE VALUES FOR p’ AND q’ (HINT: THIS IS A BINOMIAL SITUATION: MALE/FEMALE SO ASSUME 50:50 OR 0.5 AND 0.5)
d. WHAT ARE THE VALUES OF EBP AND EBP2 ?
e. WHAT ARE THE VALUES OF α AND α / 2 ?
f. USING THE z-TABLES (NOT THE t-TABLES) WHAT IS THE CRITICAL z-VALUE FOR THIS α / 2 ?
g. PLUG THESE VALUES INTO THE FORMULA FOR “n” AND YOU GET n = _________
(YOU CAN REFER TO ILLOWSKY Section 8.3 and SAMPLE PROBLEM 8.14 AROUND PAGE 463 FOR OTHER GUIDANCE)