STATISTICS
Quiz 1_Stat200:
For question #1, a typical row for the Stem & Leaf plot might look like 3(0269. Where, the vertical line is obtained by using “Shift & |\” key. The line is inserted at the location of the cursor.
Submit quiz via the Assignment Folder
See Descriptive Stats in Videos-Topics In Stats 200
(15%) 1. A survey of the number of calls received by a sample of Southern Phone Company subscribers last week revealed the following information.
52 43 30 38 30 42 12 46 39 37 34 46 32 18 41 5
a) Develop a stem-and-leaf chart. See above for tips on presenting Stem-Leaf plot.
b) What basic conclusions can be drawn from this chart?
c) Compute the 5 number System for this data set and list Xmin,Q1,Q2,Q3 & Xmax. See Videos and show how the pointer was computed for each quartile.
(15%)
See Two Way Table in Videos- Topics In Stat 200/Probability Rules
2. For the following table, what is the value of :
a) P(A1)
b) P(B1│A2)
c) P(B2 and A3). Compute this as P(B2)*P(A3│ B2) . Rows are B1 & B2: columns are A1, A2 & A3.
What are the values of P(B2) , P(A3|B2) and P(B2&A3)?
In what row and column will you find this answer?
|
Second Event
|
First Event |
|||
|
|
A1 |
A2 |
A3 |
Total |
|
B1 |
2 |
1 |
3 |
6 |
|
B2 |
1 |
2 |
1 |
4 |
|
Total |
3 |
3 |
4 |
10 |
(15 pnts.)
3. The chart below gives the percentage of counties in the US that use various methods for recording votes in 1980 & 2002. Create a side-by-side bar chart by year using the instruction videos for Side-By-Side in Content/Videos-Topics In Stat 200. Copy/paste the chart into the WORD file you use for answers to this quiz.
|
METHOD |
1980 |
2002 |
|
Punch cards |
18.5 |
15.5 |
|
Lever machines |
36.7 |
10.6 |
|
Paper ballots |
40.7 |
10.5 |
|
Optical scan |
0.8 |
43.0 |
|
Electronic |
0.2 |
16.3 |
|
Mixed |
3.1 |
4.1 |
(15 pnts.)
See Descriptive Stats in Video-Topics In Stat 200
4. The following are a sample of the weights of nine jars of peanut butter.
7.69 7.72 7.80 7.86 7.90 7.94 7.97 8.06 8.09
a) Compute the median weight.
b) Compute the standard deviation of the sample using the shortcut formula. Show the formula and values for each term and compute the answer. Use screen shot to copy/paste from LEO equation editor. See Discussion/Week 0 for instruction.
c) Compute the 5 Number System for this data. Just list the values as Xmin=xxx, Q1=xxx,Q2=xxxx,Q3=xxxxx, Xmax = xxxx.
d) Are there outliers? An outlier value is defined as unusually large or small according to the expressions: Answer yes or no.
Outlier > Q3 + 1.5(Q3-Q1) or,
Outlier < Q1-1.5(Q3-Q1)
(15 pnts.)
5. Answer questions a. through f. of 3.23 below. See Content/Videos –Topics In Stat 200/Descriptive Stats.
Show work for part f. only.
(15 pnts.)
6. Compute the mean and standard deviation for the data in the table below. State your assumptions and show all calculations. See Video- Mean & Variance for Grouped Data in Videos –Topics In Stat 200.
|
Distance |
Frequency |
|
0 to 5 5 to 10 10 to 15 15 to 20 20 to 25 |
4 15 27 18 6 |
(10%)
7. A box contains 3 red balls and 4 green balls. If two balls are randomly selected in sequence, without replacement, what is the possibility that a red ball and a green ball are picked out of the box. State the rule for P(A&B) and then substitute the values and compute the answer. See probability rules for dependent events in Videos – Topics In Stat 200
RESOURCES FOR QUIZ
TWO WAY TABLE
First, notice the overall total of all events is 10. If this is not given, then you must compute this as was done in the table above.
Add all occurrences in B1 and B2 and then sum the two values.
1. Find P(B2&A3)?
Go to row B2 and column A3 to get the value of 1. The number of successes divided by the total possible outcomes is 1/10. So,
P(B2&A3) = 1/10
Alternatively, P(B2&A3) = P(B2)P(A3|B2) = 4/10(1/4) = 1/10.
Note that if you are given that you are in row B2, P(B2)=4/10, now the probability that you are in column A3 is 1/4 because you could be in column A1 or A2 or A3.
There is one occurrence in A3 and a total of 4 in all columns for row B2.
The result is the same for P(B2&A3)=1/10.
2. Find P(B1 or A2)?
From the addition rule, P(B1 or A2) = P(B1) + P(A2) - P(B1&A2), So,
P(B1) = 6/10 because there are 6 occurrences of being in row B1
P(A2) = 3/10 because there are 3 occurrences of being in column A2
P(B1&A2) = 1/10 because there is a single occurrence in row B1 and column A2
P(B1 or A2) = 6/10 + 3/10 - 1/10 = 8/10 = 4/5
3. Find P(B1|A1)?
By observation, if you are given that your in column A1 then you may be in row B1 or B2.
To be in row B1 the probability is 2/3 since there are 2 occurrences in B1 and 1 in B2.
So, P(B1|A1) = 2/3
Alternatively,
Median & Mean
Median-Mean.swf
Definitions:
I. Median: Number of data values above the median is the same as below the median.
a. odd number of data values: Median is the middle value. Given the values:
3 4 7 10 12, the median is 7
b. even number of data values: Median is the average of the two middle values. Given the values;
3 4 7 8 10 12, the median is (7+8)/2 = 7.5
II. Mean: The average of the values of data.
x¯=∑n1xn
were is the sum of the values in the data set and n is the number of values. Given the data set 3 4 7 10 12, the mean is
∑x = 3+4+7 +10+12 = 36 and n = 5. So the mean is 36/5 = 7.2
III. Mean & Median for skewed data.
Referring to Chapter 3 of e-Text;
mean < median for left skewed data
mean > median for right skewed data
IV. Implications
If the salaries for football players is right skewed, the players would bargain based on the median value (lower salaries) and the owners would bargain based on the mean salaries being a larger value.
Variance & Std Dev
Variance_Std_Dev.swf
Definitions
I. Variance for a sample of n values
S2=∑(x¯−xi)2n−1
The above equation is the fundamental idea behind the definition. The spread between each value of data and the mean value is squared before averaging. Values far from the mean are weighed very heavily. So if all the data values are close to the mean the variance will be small compared to values far from the mean.
A more convenient method for computing the variance is called the "short cut equation." It is derived from the equation above. I will spare you the details and present it as;
S2=∑x2i−(∑xi)2nn−1
See the video on Short Cut Equation to see how the equation editor is used for this.
Short_Cut_Eqn_Example.swf
II. Standard Deviation: Positive root of the Variance
S=S2−−√
III. Z score (population)
Z=x−μσ
Z is the number of std. dev from the mean for any value of the random variable , x.
a. Chebyshev's Rule For any frequency distribution. Skewed or Symmetrical
For any number k>1 at least
(1−1k2)
of the measurements will fall within ±k std dev of the mean, i.e., within the interval μ±k(std dev)
For k=2, 1-1/4 = 3/4, at least 75% of the values of the random variable will fall within ±2 std dev of the mean, i.e, Range = μ±2(std dev)
b. Empirical Rule for Mound Shape (Symmetrical Distributions)
68% of random variable falls within ±s of the mean.
95% of random variable falls within ±2s of the mean
99.7% of random variable falls within ±3s of the mean.
Mean & Variance for Grouped Data
Mean_Variance_-Grouped_Data.swf
Suppose data is listed in tabular form much like that for a data sheet to create a histogram, i.e., the bins and frequencies are listed in a table.
The equation for the mean & variance is estimated to be,
x=∑(fM)∑f
and
S2=∑fM2−(∑fM)2nn−1
Where f is the frequency for each bin and M is the midpoint value for each bin (class interval) and n is the total of the frequencies.
Example; The number of tickets sold for each price range in dollars for a concert are given as,
Price range,$ f M fM f x M2
20-30 7 25 175 4375
30-40 12 35 420 14700
40-50 21 45 945 42525
50-60 18 55 990 54450
60-70 12 65 780 50700
Total 70 3310 166750
Now, the best estimate of the mean ticket price is
x=331070=47.28
and the variance is
S2166750−(3310)27069=148.35
Or, s = 12.18 dollars
These computations assume that all the counts, frequencies, in a given bin have a value equal to the midpoint value of the bin. We do not know the exact value of each ticket price that is in a given bin.
5 No. System & Box Plots
5_Number_Sysytem.swf
The 5 number system is composed of the values Xmin, Q1,Q2,Q3, & Xmax. Where,
Xmin is the minimum value of the random variable, Xmax is the maximum value of the random variable for a given data set.
Q1 is the 1st quartile. 25% of values are below Q1.
Q2 is the 2nd quartile. 50% of values are below Q2. It is the median.
Q3 is the 3rd quartile. 75% of values are below Q3.
Outliers are extreme values in a data set and are frequently ignored as exceptions to the rule in a measurement set. They are often defined as,
Outliers > Q3+1.5(Q3-Q1)
< Q1-1.5(Q3-Q1)
The pointer to the data value that is a given percentile is computed as,
Lp = (n+1)xP/100
Where P is the desired percentile and n is the number of values in the data set. Given the data set.
1,3,4,7,8,10,12,15,16,17,20,22,25
Find Q1, Q2 & Q3
L(25)= (13+1)(.25) = 3.5 which points to the average of 4 & 7 = 5.5
Q1= 5.5
L(50) = (13+1)(.5) =7 which points to the 7th position, or 12
Q2 = 12
L(75) = (13+1)(.75) = 10.5 which is the average of the 10th & 11th value, or (17+20)/2=18.5
Q3 = 18.5
Also, Xmin = 1 & Xmax = 25
Outliers are > 18.5 +1.5(18.5-5.5) = 38
< 5.5 -1.5(18.5-5.5) = -14
There are no outliers No values > 38 or < -14.
Box plots are a graphical presentation of the 5 number system. PHStat will generate a box plot from raw data. If you have not loaded PHStat, then just compute and list the 5 number system.
Stem-Leaf Plots
Stem-Leaf.swf
The random variable is the quantitative category of data selected for analysis. Examples of the random variable are prices of cars sold, stock value for a give company, heights of 8th graders, ages of persons on Medicare, weights of players on a given football team.
The discrete frequency distribution for a given random variable is just the count, frequency of occurrence, that falls within a given range of the random variable. We call these ranges bins or class intervals.
Now let's create a Stem-leaf plot for the first 3 years (1997-1999) of the stock values in Table 6.1.
1. Round the values to the nearest tenths. 3.28=3.3 & 3.34=3.3.
2. Stem has the value of integers and the leaf has values in tenths. This is an arbitrary choice and this is how it was decided to be presented.
3. For the first row of data the stem -leaf plot would be;
-1|48
-0|87922
0|13887
The -0 stem has leafs of
-0.75 = -0.8 or 8
-0.69 = -0.7 or 7
-0.88 = -0.9 or 9
-0.22 = -0.2 or 2
- 0.16 = -0.2 or 2
The creator of the stem leaf must first decide how to define the stem weighting and the format for the leafs. There is no unique way to present a given set of data. Like any chart creation, it takes experience to present a the data clearly.
Usually, we work with whole numbers such as 5,9,12,14,15,17,19, 21,23,26,28,30,34,36,38. This is displayed as
0|5,9
1|2,4,5,7,9
2|1,3,6,8,
3|0,4,6,8
The stem represents 10's and the leaf units. Values are stem x10 +unit.