solve the following activities

profilelrn_jeyca
Activities4-6Fall2017.docx

Page 7

Activity 4

Topic 1 – Make a histogram

How to calculate the number of bars (bins) of a histogram when the length of the class interval is known:

(Upper limit – lower limit)/(length class interval) = number of bars

How to calculate length of a class interval of a histogram when the number of bars is known:

(Upper limit – lower limit)/number of bars = length class interval

Complete the tasks:

Create a histogram that displays the number of books bought by 50 part-time college students at ABC College on the x-axis and frequency on the y-axis

The following data are the number of books bought by 50 part-time college students at ABC College. The number of books is discrete data since books are counted.

1 1 1 1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

4 4 4 4 4 4

5 5 5 5 5

6 6

Eleven students buy 1 book. Ten students buy 2 books. Sixteen students buy 3 books. Six students buy 4 books. Five students buy 5 books. Two students buy 6 books.

Because the data are integers, subtract 0.5 from 1, the smallest data value and add 0.5 to 6, the largest data value. Then the starting point is 0.5 and the ending value is 6.5.

Next, calculate the width of each bar or class interval. If the data are discrete and there are not too many different values, a width that places the data values in the middle of the bar or class interval is the most convenient.

Since the data consist of the numbers 1, 2, 3, 4, 5, 6 and the starting point is 0.5, a width of one places the 1 in the middle of the interval from 0.5 to 1.5, the 2 in the middle of the interval from 1.5 to 2.5, the 3 in the middle of the interval from 2.5 to 3.5, the 4 in the middle of the interval from _______ to _______, the 5 in the middle of the interval from _______ to _______, and the _______ in the middle of the interval from _______ to _______ .

Calculate the number of bars as follows: (6.5 – 0.05)/bars = 1

Review:

http://cnx.org/contents/[email protected]:IKeXSLMS@14/Histograms

A histogram consists of contiguous boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents (for instance, distance from your home to school). The vertical axis is labeled either Frequency or relative frequency. The graph will have the same shape with either label. The histogram (like the stemplot) can give you the shape of the data, the center, and the spread of the data. (The next section tells you how to calculate the center and the spread.)

The relative frequency is equal to the frequency for an observed value of the data divided by the total number of data values in the sample. (In the chapter on Sampling and Data, we defined frequency as the number of times an answer occurs.) If:

· f = frequency

· n = total number of data values (or the sum of the individual frequencies), and

· RF = relative frequency,

then:

RF = f/n

For example, if 3 students in Mr. Smith's English class of 40 students received from 90% to 100%, then, f = 3, n = 40, RF = f/n =3/40 = 0.075.

Seven and a half percent of the students received 90% to 100%. Ninety percent to 100 % are quantitative measures.

To construct a histogram, first decide how many bars or intervals, also called classes, represent the data.

Many histograms consist of from 5 to 15 bars or classes for clarity. Choose a starting point for the first interval to be less than the smallest data value. A convenient starting point is a lower value carried out to one more decimal place than the value with the most decimal places.

For example, if the value with the most decimal places is 6.1 and this is the smallest value, a convenient starting point is 6.05 (6.1 - 0.05 = 6.05). We say that 6.05 has more precision. If the value with the most decimal places is 2.23 and the lowest value is 1.5, a convenient starting point is 1.495 (1.5 - 0.005 = 1.495). If the value with the most decimal places is 3.234 and the lowest value is 1.0, a convenient starting point is 0.9995 (1.0 - .0005 = 0.9995). If all the data happen to be integers and the smallest value is 2, then a convenient starting point is 1.5 (2 - 0.5 = 1.5). Also, when the starting point and other boundaries are carried to one additional decimal place, no data value will fall on a boundary.

The following data are the heights (in inches to the nearest half inch) of 100 male semiprofessional soccer players. The heights are continuous data since height is measured.

Heights

Frequency

Heights

Frequency

Since the data with the most decimal places has one decimal (for instance, 61.5), we want our starting point to have two decimal places. Since the numbers 0.5, 0.05, 0.005, etc. are convenient numbers, use 0.05 and subtract it from 60, the smallest value, for the convenient starting point.

The smallest data value is 60. 60 - 0.05 = 59.95 which is more precise than, say, 61.5 by one decimal place. The starting point is, then, 59.95.

60.0

1

68.0

2

60.5

1

69.0

10

61.0

2

69.5

5

61.5

1

70.0

6

63.5

3

70.5

3

64.0

7

71.0

3

64.5

8

72.0

3

66.0

10

72.5

2

66.5

11

73.0

1

67.0

12

73.5

1

67.5

7

74.0

1

The largest value is 74. 74+ 0.05 = 74.05 is the ending value.

Next, calculate the width of each bar or class interval. To calculate this width, subtract the starting point from the ending value and divide by the number of bars (you must choose the number of bars you desire). Suppose you choose 8 bars. (74.05 – 59.95)/8 =1.76

We will round up to 2 and make each bar or class interval 2 units wide. Rounding up to 2 is one way to prevent a value from falling on a boundary. Rounding to the next number is necessary even if it goes against the standard rules of rounding. (Using 1.76 as the width would also work.)

The heights 60 through 61.5 inches are in the interval 59.95 - 61.95. The heights that are 63.5 are in the interval 61.95 - 63.95. The heights that are 64 through 64.5 are in the interval 63.95 - 65.95. The heights 66 through 67.5 are in the interval 65.95 - 67.95. The heights 68 through 69.5 are in the interval 67.95 - 69.95. The heights 70 through 71 are in the interval 69.95 - 71.95. The heights 72 through 73.5 are in the interval 71.95 - 73.95. The height 74 is in the interval 73.95 - 75.95.

The following histogram displays the heights on the x-axis and relative frequency on the y-axis.

The boundaries are:

· 59.95

· 59.95 + 2 = 61.95

· 61.95 + 2 = 63.95

· 63.95 + 2 = 65.95

· 65.95 + 2 = 67.95

· 67.95 + 2 = 69.95

· 69.95 + 2 = 71.95

· 71.95 + 2 = 73.95

· 73.95 + 2 = 75.95

Activity 5

Topic 1 – Finding Quartiles and Percentiles Using a Table

1. Fifty statistics students were asked how much sleep they get per school night (rounded to the nearest hour). The results were (student data):

AMOUNT OF SLEEP PER SCHOOL NIGHT (HOURS)

FREQUENCY

RELATIVE FREQUENCY

CUMULATIVE RELATIVE FREQUENCY

4 – < 5

2

0.04

0.04

5 – < 6

5

0.1

0.14

6 – < 7

7

0.14

0.28

7 – < 8

12

0.24

0.52

8 – < 9

14

0.28

0.8

9 – < 10

7

0.14

0.94

10 – < 11

3

0.06

1

Find the 28th percentile:

Find the median:

Find the third quartile:

Find the 10th percentile:

Find the 90th percentile:

For runners in a race, a higher speed means a faster run. Is it more desirable to have a speed with a high or a low percentile when running a race?

The 40th percentile of speeds in a particular race is 7.5 miles per hour. Write a sentence interpreting the 40th percentile in the context of the situation.

On an exam, would it be more desirable to earn a grade with a high or low percentile? Explain.

In a survey collecting data about the salaries earned by recent college graduates, Li found that her salary was in the 78th percentile. Should Li be pleased or upset by this result? Explain.

Review: Measures of the Location of the Data

http://cnx.org/contents/[email protected]:HAUddm43@18/Measures-of-the-Location-of-th

The common measures of location are quartiles and percentiles (%iles). Quartiles are special percentiles. The first quartile, Q1 is the same as the 25th percentile (25th %ile) and the third quartile, Q3, is the same as the 75th percentile (75th %ile). The median, M , is called both the second quartile and the 50th percentile (50th %ile).

To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Recall that quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test. It means that 90% of test scores are the same or less than your score and 10% of the test scores are the same or greater than your test score.

Percentiles are mostly used with very large populations. Therefore, if you were to say that 90% of the test scores are less (and not the same or less) than your score, it would be acceptable because removing one particular data value is not significant.

The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data. It is the difference between the third quartile (Q3 and the first quartile (Q1): IQR=Q3−Q1

The IQR can help to determine potential outliers. A value is suspected to be a potential outlier if it is less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile. Potential outliers always need further investigation.

Example: 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th, 10th, 11th, 12th, 13th values

Example: 1st, 2nd, 3d, 4th, 5th, 6th, 7th, 8th values

What value is the middle value?

What value is the 25th, and 75th percentile?

Example: 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th, 10th values

What value is the 50th percentile?

What value is the 25th, and 75th percentile?

Example:

Activity 6

Topic 1 – Skewness and the Mean, Median, and Mode

Consider the three data sets:

1. 4 5 6 6 6 7 7 7 7 7 7 8 8 8 9 10

2. 4 5 6 6 6 7 7 7 7 8

3. 6 7 7 7 7 8 8 8 9 10

For each data set construct a histogram of the data where each interval has a width of one and each value is located in the middle of an interval. Then answer the questions that follow.

a) Does the histogram display a symmetrical or skewed distribution of data?

b) Can you draw a vertical line at some point in the histogram such that the shape to the left and the right of the vertical line are mirror images of each other?

c) What is the mean, the median, and the mode of the data set.

Data set 1

a.

b.

c.

Data set 2

a.

b.

c.

Data set 3

a.

b.

c.

Review:

The "center" of a data set is also a way of describing location. The two most widely used measures of the "center" of the data are the mean (average) and the median. To calculate the mean weight of 50 people, add the 50 weights together and divide by 50. To find the median weight of the 50 people, order the data and find the number that splits the data into two equal parts. The median is generally a better measure of the center when there are extreme values or outliers because it is not affected by the precise numerical values of the outliers. The mean is the most common measure of the center.

The words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean" and "average" is technically a center location. However, in practice among non-statisticians, "average" is commonly accepted for "arithmetic mean."

The mean can also be calculated by multiplying each distinct value by its frequency and then dividing the sum by the total number of data values. The letter used to represent the sample mean is an x with a bar over it (pronounced "x bar"):

.

The Greek letter μ (pronounced "mew") represents the population mean. One of the requirements for the sample mean to be a good estimate of the population mean is for the sample taken to be truly random.

To see that both ways of calculating the mean are the same, consider the sample:

1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 4

AIDS data indicating the number of months an AIDS patient lives after taking a new antibody drug are as follows (smallest to largest): 3, 4, 8, 8, 10, 11, 12, 13, 14, 15, 15, 16, 16, 17, 17, 18, 21, 22, 22, 24, 24, 25, 26, 26, 27, 27, 29, 29, 31, 32, 33, 33, 34, 34, 35, 37, 40, 44, 44, 47

Calculate the mean and the median.

Suppose that, in a small town of 50 people, one person earns $5,000,000 per year and the other 49 each earn $30,000. Which is the better measure of the "center," the mean or the median?

Another measure of the center is the mode. The mode is the most frequent value. If a data set has two values that occur the same number of times, then the set is bimodal.

Five real estate exam scores are 430, 430, 480, 480, 495. The data set is bimodal because the scores 430 and 480 each occur twice.

When is the mode the best measure of the "center"? Consider a weight loss program that advertises a mean weight loss of six pounds the first week of the program. The mode might indicate that most people lose two pounds the first week, making the program less appealing.

I was able to complete this activity successfully: Yes or No

Name _________________________________________ Date _________________