STAT 200

profilestudy19
stat200project2_2017.pdf

Stat 200 Project 2 2017b Name _____

Instructions: For best readability, print this document from a PDF viewer, not a web-browser. Printing directly from a website often omits details or changes the fonts. Click "download", then "open" to automatically load the PDF into a viewer. If your work computer is blocked from printing, use another (such as the Umuc Lab)

Fill in the parts below and turn in the completed 2-page document at a face-to-face class. With prior approval you may scan and email it (as a PDF or images) to avoid missing the due date.

To facilitate grading, this project print-out should be completed manually (writing / marking). It could be completed using a PDF editor as long as the 2-page limit and format remain. It's also possible to complete it with a word-processor but attach the entire PDF as a cover sheet (single or double-sided paper).

Problems 1 – 6 below use the adjacent Figure 1 graph → There is a selective breeding program for mice, clearly showing divergence after 30 generations for the

High, Control, and Low open-field activities (whatever that actually is). For convenience, assume the central limit theorem applies, and treat H1 & H2 together (the "H" group) and likewise L1 & L2 ("L" group). The more interesting question is "When does the difference between H and L become significant?" (from one interval to the next). The general procedure is to pick 2 adjacent intervals on the Generation axis, estimate the H and L values in each, then test the null hypothesis that their means are the same. If the null is retained, then the interval gap is widened, with the steps repeated until the null gets rejected.

1 Visually inspect Figure 1 where H & L seem to be nearby in one interval and farther apart in the next. You could start with S1 and S2, or any other adjacent grouping where this pattern seems to hold (S0 is the left edge of the chart). On the graph, circle (or highlight with a marker) your initial intervals. Circle (highlight) the sections of the H1 & L1 lines above the interval markets. You'll estimate your data from those.

Also write your selected intervals here: S__ S__

2 For the H data estimate the starting, ending, and middle values for both solid lines (6 observations). For the L data, estimate the values for the dots in each interval (the number of observations here depends on the steepness of the dotted sections). There may be duplicate values. Write these numbers in the margin of the graph, or here: values for H: values for L:

3 We know the H and L series diverge. Briefly explain... a) why a 1-tailed (or 2-tailed) hypothesis test is more suitable

b) what is the appropriate distribution to use, and why?

c) write in symbols, Ho: and Ha:

4 Write the name of the app (or website address) used:

5 Write the test value, critical value, and the p-value below, along with relevant degrees of freedom.

6 a) What is the conclusion regarding the hypothesis test?

b) if the null is retained in 'a', increase the rightside selection interval (if your originals were S4 & S5, now they'll be S4 & S6, etc) and repeat #2, #5 and #6a until the null is rejected. Indicate the pair of intervals where this first happens: S__ S__

- - - - New, separate problem: Boxes A and B are the population - - you aren't allowed to change their contents.

"Box A has 10 apples and Box B has 200 oranges and 120 apples. Someone draws 5 items from both boxes, 3 from Box A and 2 from Box B, ending up with more apples than oranges even though there are 200 oranges total and 130 apples total."

7. a) Short version: "what's wrong with the way they did it?" More specifically, what are (at least) 2 modifications to use so the sample better represents the population?

b) if 5 items each are randomly selected from A and from B, does oversampling occur? If yes, where?