statistics with R programming
Assignment 3, STA 622, Fall 2020
Show all work, which can include explanations.
1. Recall that during class we did an example of a continuous pdf f(x) = 3x2 for 0 <= x <= 1, and 0 other wise. We calculated the expected value (population mean) to be 0.75 in class, and we also showed that the cdf is F(x) = x3 for 0 <= x <= 1.
(a) Determine the population median of this distribution.
(b) Compare the population mean to the population median. Explain why it makes sense that the mean is lower than the median.
2. Use the following probability distribution to answer parts (a), (b), (c) and (d). Note that p(x) is 0 for any values of x other than 0, 2, 5 and 6.
|
x |
0 |
2 |
5 |
6 |
|
p(x) |
.1 |
.3 |
.35 |
.25 |
(a) Explain why p(x) is in fact a probability distribution.
(b) Determine the cdf F(x), making sure to specify the value of F(x) for all real number inputs, and sketch a graph of F(x).
(c) Calculate the expected value of X (the population mean of X).
(d) Calculate the population variance and standard deviation.
3. Load the asbio package in R, and use the data set called deer (issue the command data(“deer”) to R after loading asbio, which will tell R to recognize this data set). Here is a description of the variables in this data set (more info can be found at the manual for asbio by googling). Please remember that R is case sensitive.
Birth.Yr Year of birth.
Litter.size Number of offspring.
Region Categorical variable with two factor levels. BH = Black Hills, ER = Eastern Region.
Dam.weight Dam weight in kg.
Total.birth.mass Mass of litter in kg.
Prop.mass Total birth mass divided by dam weight
(a) Use R to make a side by side boxplot, with Prop.mass as the response variable and Region as the explanatory variable. Show your code and output.
(b) Use R to acquire the values of the mean, median, standard deviation and IQR for each region. Show your code and output.
(c) Compare the distribution of Prop.mass across regions, based on 4 components of a distribution. As part of this, decide if mean or median is more appropriate, and explain your reasoning. Do the same for standard deviation and IQR.