Statistics and probability
Hsci 2117: Problem Set 1 – Describing and Summarizing Data
Show all work and calculations for all questions on this problem set. If you use excel, attach the excel file to your submission. Include in your final word document a description of the process and method you used to solve each question. Provide sufficient explanation of how you got the answer to any question.
1. Identify the following sets of data as either categorical, discrete, or continuous:
a. Number of first cousins: {6,7,5,2,9,12,3,8} (5%)
b. Country of birth: {Canada, France, Germany, Italy, Japan, Russia, UK, USA, EU} (5%)
c. {3.14159, 2.71828, 1.61803, 1.41421, 0.37396} (5%)
2. List at least three measures of location. Explain how to calculate each of them. (10%)
3. List at least three measures of spread. Explain how to calculate each of them. (10%)
A researcher wishes to conduct a study on differences in protein consumption by country of origin amongst immigrants in NYC. The researcher selects a sample of patients from the population of interest. The data below represents the ages of patients enrolled in the study. Use this data to answer questions 4-6.
|
52 |
52 |
63 |
55 |
33 |
47 |
71 |
|
48 |
30 |
52 |
45 |
52 |
40 |
55 |
|
67 |
57 |
45 |
43 |
49 |
45 |
45 |
|
38 |
44 |
46 |
53 |
58 |
61 |
44 |
4. Construct the frequency distribution of patient ages. Group the data by 5’s, i.e. 20-24, 25-29, etc. Be sure to include frequency, percent, cumulative frequency, and cumulative percent. (20%)
6. Create the following graphs from the data:
a. Stem and Leaf Plot (5%)
b. Histogram (with a bin width = 5) (5%)
c. Pie Chart (group the ages by 10s, i.e. 20-29, 30-39, etc etc) (5%)
A researcher conducts a study, and as part of the study, measures the height and weight of patients that she has split into two study groups. The data below represents the height and weight of 20 patients split into two groups enrolled in a study. Use this data to answer question 7.
|
GROUP 1 |
Patient ID |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
|
|
Height |
67 |
70 |
67 |
70 |
63 |
69 |
66 |
65 |
65 |
63 |
|
|
Weight |
163 |
168 |
147 |
161 |
136 |
150 |
153 |
148 |
146 |
135 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GROUP 2 |
Patient ID |
K |
L |
M |
N |
O |
P |
Q |
R |
S |
T |
|
|
Height |
67 |
66 |
63 |
71 |
61 |
69 |
63 |
69 |
66 |
60 |
|
|
Weight |
177 |
171 |
171 |
169 |
161 |
177 |
174 |
168 |
165 |
157 |
7. Plot the data on a single scatterplot, keeping height on the horizontal access. Be sure to differentiate patients in the two groups (for example, with different colors). (10%)