Statistics Discussion

profiletrvghtstarserviges
discusion4.docx

WEEK 4 DISCUSSION:  AREAS UNDER THE CURVE

Let’s pull things together.  So far, for data sets, you have calculated MEANS, VARIANCES and STANDARD DEVIATIONS as well as QUARTILES and used those to determine if any data re “UNUSUAL”.  These would be data that are more than 2 standard deviations above or below the mean OR data that are more than 1.5 x IQR above and below the mean. 

You have calculated FREQUENCY/RELATIVE FREQUENCY/CUMULATIVE RELATIVE FREQUENCY TABLES and with those you can determine how much of your data (or the probability that your data) are above or below a certain range (e.g., 21-30 or 51-60, etc.)

Currently, we are STANDARDIZING our raw data (x-values) to get z-values that are the number of standard deviations above or below the standardized mean of zero (the NORMAL DISTRIBUTION).  We can use these z-values in our TABLE to determine the probability of data being BELOW (always to the LEFT) of a specific data value.  Then, by subtracting that probability from 1.0000, we get the probability of data being ABOVE (to the right) of that data value.  We can then see if any of the extreme data points (low end or high end) have a probability (from the Table) greater than or less than one of our “critical” +z-values (1%, 5% or 10%) which would make that data value “UNUSUAL”. 

SO, SHOW WHAT YOU KNOW.  Do the required calculations (SOFTWARE is fine for the calculation BUT USE THE ACTUAL TABLES FOR THE Z-VALUE PROBABILITIES)  and fill in the Tables below.

1)  Write down 11 numbers between 1 and 100.  These can be whole numbers but we will assume this is CONTINUOUS data (not DISCRETE).  Rank order them.

 

     1

      2

      3

      4

      5

       6

        7

        8

       9

    10         |       11

x-values

 

 

 

 

 

 

 

 

 

                  |

 

MEAN: ____, VARIANCE:____, STD DEV:____, Q1____, Q2____, Q3____, IQR____

2) Use those statistics to determine if any of your data values are “UNUSUAL”.

(a) Mean + 2 standard deviations =  ____   and ___ .  Unusual data values?____________

(b) Mean + 1.5 * IQR = _____ and _____.  Unusual data values?  _______________

3)  Fill in this Frequency Table by putting you data points into the ranges given.

RANGE

FREQUENCY

RELATIVE FREQ.

CUMULATIVE RELATIVE FREQ

1-10

 

 

 

11-20

 

 

 

21-30

 

 

 

31-40

 

 

 

41-50

 

 

 

51-60

 

 

 

61-70

 

 

 

71-80

 

 

 

81-90

 

 

 

91-100

 

 

(TOTAL MUST EQUAL 1.0 OR BE VERY CLOSE)

Using the above table:

(a) What percent of your data values are at or below 50:  _______. 

(b) What percent of your data are at or below between 61: ______   and at or below 90 : _______

(c)  So, what percent are between 61 and 90?  ______________

4)  Lastly, let’s STANDARDIZE (z-values) the data (x-values)

 

1

2

3

4

5

6

7

8

9

10           |          11

x-values

 

 

 

 

 

 

 

 

 

                          |

z-values

 

 

 

 

 

 

 

 

 

                          |

Probability to LEFT*

 

 

 

 

 

 

 

 

 

                          |

                          |

Probability to RIGHT*

 

 

 

 

 

 

 

 

 

                          |

                          |

* From the z- TABLES (NOT SOFTWARE) determine the area to the LEFT of each standardized x-value.  This is the PROBABILITY that our data is less than or equal to that data point.  Subtract that area from 1.0000 to get the probability that our data are greater than that data points.  Obviously, these two probabilities MUST add up to 1.0000 or 100% which accounts for all of our data.