R code _Home work1

profileAwind24
HW21.pdf

Applied Multivariate Data Analysis HW2: Due Monday Feb9th.

All R code/output should be well commented, with relevant outputs highlighted. (Q1) Consider the iris data available in R. a. Construct side-by-side boxplots of the four quantitative variables (SL, SW, PL, PW) for each species. Do not forget to properly label the axes and give a proper title to the plots when needed. [Hint: your will have 3 plots, one for each species. Each plot will contain four boxplots. See the function boxplot() in R ]

b. Construct a pairs-plot of the four variables for each of the three species. c. Define the vector. x = [Sepal. Length, Sepal. Width, Petal. Length, Petal. Width]T

The dataset have 50 observations of this vector (one for each flower), and with 3 species. Thus, we have a sample of size for each of the three species. Compute the sample mean (a vector), the sample covariance matrix (a matrix) and the sample correlation matrix (a matrix) for each species.

d. Looking at the pairs plot in (b) and the correlation matrices in (c), do you see any patterns or differences among the species? Explain. (Q2) Consider the skulls dataset in the HSAUR3 package in R. You will first need to install the package in R to access the dataset. Use ?skulls command to get more details on the data. A snapshot of the data is shown below library(HSAUR3) ## Loading required package: tools head(skulls) a. Suppose we want to estimate the population mean of all the 4 variables for skulls with epoch c4000BC. Write down the population, parameter, the sample and the statistic you will use to answer the question above. b. From the skulls data set, provide an estimate (numeric value) of the parameter mentioned above. Explain how you obtained it. c. Now suppose we want to estimate the population variance-covariance matrix. When estimator will you use? Provide a numeric estimate. d. Compute the variance covariance matrix of the estimator of the population mean in part (b). e. Provide an estimate for the parameter vector (μmb − μnh, μbh − μnh)T and compute the covariance matrix of the estimator.

(Q3)You can use R to answer this question

Let 𝐴𝐴 = � 9 −2−2 6 �

a) Is 𝐴𝐴 symmetric? b) Determine the eigenvalues and eigenvectors of 𝐴𝐴. c) Write out the spectral decomposition of 𝐴𝐴 = 𝑃𝑃Λ𝑃𝑃′, i.e. find the matrices

𝑃𝑃 and Λ . d) Verify that 𝑃𝑃𝑃𝑃′ = 𝑃𝑃′𝑃𝑃 = 𝐼𝐼 using 𝑃𝑃 from part (c). e) Find 𝐴𝐴−1. f) Find the eigenvalues and eigenvectors of 𝐴𝐴−1. g) Write out the spectral decomposition of 𝐴𝐴−1 = 𝑃𝑃Λ−1𝑃𝑃′, i.e. find the matrices

𝑃𝑃 and Λ−1.

h)Find the matrices 𝐴𝐴 1 2 and 𝐴𝐴−

1 2. Verify that 𝐴𝐴

1 2𝐴𝐴

1 2 = 𝐴𝐴 and 𝐴𝐴−

1 2𝐴𝐴−

1 2 = 𝐴𝐴−1.

(Q4) Consider the following 3 matrices A, B & C and determine which of these the matrix are positive definite

A B C

1 0.5

0.5 1.25

     

1 0.5

0.5 0.26

     

1 0.5

0.5 0.01

     

(Q5) Suppose our multivariate data have sample covariance matrix

S =

[ 2 -3 2

-3 6 4

2 4 3 ]

a) Based on this covariance matrix, how many columns (variables) does the original data matrix have? Can you tell how many rows the original data matrix has?

b) Find the inverse of S.

c) Find and write the correlation matrix for this data set.

(Q6) The air pollution data set is given on the canvas.

For this problem, we will focus only on the first 16 observations (cities).

You can read the data into R (as a data frame) with the code:

airpol.full <- read.table("airpoll.txt", header=T)

city.names <- as.character(airpol.full[1:16,1])

airpol.data.sub <- airpol.full[1:16,2:8]

# Perform your analysis on the 'airpol.data.sub' subset.

(a) Use R to calculate the sample covariance matrix and the sample correlation matrix for this data subset.

(b)Identify which pairs of variables seem to be strongly associated. Write a paragraph describing the nature (strength and direction) of the relationship between these variable pairs.