Statistics and probability

Esther303
ProblemSet111.docx

Hsci 2117: Problem Set 1 – Describing and Summarizing Data

Show all work and calculations for all questions on this problem set. If you use excel, attach the excel file to your submission. Include in your final word document a description of the process and method you used to solve each question. Provide sufficient explanation of how you got the answer to any question.

1. Identify the following sets of data as either categorical, discrete, or continuous:

a. Number of first cousins: {6,7,5,2,9,12,3,8} (5%)

b. Country of birth: {Canada, France, Germany, Italy, Japan, Russia, UK, USA, EU} (5%)

c. {3.14159, 2.71828, 1.61803, 1.41421, 0.37396} (5%)

2. List at least three measures of location. Explain how to calculate each of them. (10%)

3. List at least three measures of spread. Explain how to calculate each of them. (10%)

A researcher wishes to conduct a study on differences in protein consumption by country of origin amongst immigrants in NYC. The researcher selects a sample of patients from the population of interest. The data below represents the ages of patients enrolled in the study. Use this data to answer questions 4-6.

52

52

63

55

33

47

71

48

30

52

45

52

40

55

67

57

45

43

49

45

45

38

44

46

53

58

61

44

4. Construct the frequency distribution of patient ages. Group the data by 5’s, i.e. 20-24, 25-29, etc. Be sure to include frequency, percent, cumulative frequency, and cumulative percent. (20%)

5. Calculate the following summary statistics: mean, variance, standard deviation, minimum, median, maximum. Show all work, and any formulas used. (20%)

6. Create the following graphs from the data:

a. Stem and Leaf Plot (5%)

b. Histogram (with a bin width = 5) (5%)

c. Pie Chart (group the ages by 10s, i.e. 20-29, 30-39, etc etc) (5%)

A researcher conducts a study, and as part of the study, measures the height and weight of patients that she has split into two study groups. The data below represents the height and weight of 20 patients split into two groups enrolled in a study. Use this data to answer question 7.

GROUP 1

Patient ID

A

B

C

D

E

F

G

H

I

J

Height

67

70

67

70

63

69

66

65

65

63

Weight

163

168

147

161

136

150

153

148

146

135

GROUP 2

Patient ID

K

L

M

N

O

P

Q

R

S

T

Height

67

66

63

71

61

69

63

69

66

60

Weight

177

171

171

169

161

177

174

168

165

157

7. Plot the data on a single scatterplot, keeping height on the horizontal access. Be sure to differentiate patients in the two groups (for example, with different colors). (10%)