R run
Applied Multivariate Data Analaysis-HW3:
(1) Suppose 𝒙𝒙 ∼ 𝑁𝑁2 �� 5
10 � , �2 1
1 2 ��. Complete the following:
(a)Which of the plots below is the correct contour plot for the distribution? Explain your choice by specifying particular characteristics of the plot that correspond to this distribution.
(b) Roughly indicate on your chosen plot from a) where you would expect most of the (x1, x2) data values to be for a random sample. In your answer, indicate where the concentration of (x1, x2) data values would be the largest. Note that you should be able to answer this entire part without actually simulating data.
(c) Using R to draw a contour plot for X (a) and add 100 points in it.
(d) Calculate the correlation matrix for X
(f) Find f(x) at x = µ (hint use dmvnorm()).
(g)Find f(x) at x = [6, 11]′ by using dmvnorm().
(Q2)There are three typos in the dataset typo.csv, where the original point was shifted by a factor of ten. Find them as outliers using Chi-squared QQ plot of squared Mahalanobis distances
(Q3)We investigate graphically the R internal dataset swiss which you can load by data(swiss). The data contains the variables
Fertility common standardized fertility measure
Catholic #of catholics
Agriculture # of men working in agriculture environment
Examination # draftees receiving highest mark on army examination
Education # education beyond primary school for draftees
Infant.Mortality # of live births who live less than 1 year
of 47 counties in the west of switzerland dated at 1888. With ?swiss you get more information on the meaning of the variables.
a) Read the help file of stars() b) Make a star plot of all variables. What can you say about Sierre? R-Hint: data(swiss), stars(swiss, .. c) We are interested in the relation between Fertility and Education. Therefore we would like to make
a scatter-plot of Fertility against Education whose points are stars with the information of the other variables. In addition we need the argument location. R-Hint: stars(swiss[, c(2,3,5,6)], location = swiss[, c(4,1)], axes = T, ...)
d) Set the argument draw.segments to TRUE to get segments instead of stars. Place a legend with key.loc.
e) Which relation do you get from the plots?
Hep R files for Question3
?stars
data(swiss)
stars(swiss, key.loc = c(15.5,1.5))
stars(swiss[,c(2,3,5,6)],location=swiss[,c(4,1)], key.loc = c(45,90),
labels=NULL, len=3, axes=T, xlab="Education", ylab="Fertility")
stars(swiss[,c(2,3,5,6)],location=swiss[,c(4,1)], draw.segments=T,
key.loc=c(45,90), labels=NULL, len=3, axes=T, xlab="Education",
ylab="Fertility", cex=0.8)
(Q4) (Visualization: parallel coordinate plot and conditioning plot) The data quakes.csv contains the measurements of latitude (lat), longitude (long), depth (depth), magnitude (mag) and the number of reporting stations (stations) for 1000 seismic events of Mb >4.0 (body wave magnitude) that occurred in a cube near Fiji since 1964. a) Load the data saved in quakes.csv. b) Does the magnitude of the earthquake depend on the depth? (Make a scatterplot) R-Hint: You can use the command jitter(...) to plot points next to each other rather than overlapping them. c) Does the number of reporting stations depend on the magnitude? (Make a scatterplot) d) Investigate the relationships between all variables in the data using a parallel coordinate plot and a scatterplot matrix. Which method do you find more useful? R-Hint: library(MASS) parcoord(...) e) How does the depth depend on longitude and latitude? (Plot a point (pch = 20) at the position of the earthquake; the color should be green, orange or red according to the depth) R-Hint: deepVec <- cut(quakes$depth, breaks=c(0,250,450,700), labels=c("green","orange", "red")) deepVecString <- as.character(deepVec) f) Look at the help _le of coplot() to see how you could answer question e) with this command. Also use coplot() to display how depth and magnitude depend on longitude and latitude.
(Q5) The data consists of the emissions of three different pollutants from 46 different engines. A data engine.csv is available.
(a)For each pollutant verify the normality (hint: use histogram,qqnorm, shapiro test and chi- square plot) (b) Test for Multivariate Normal for the engine data.
(c) Check for Outliers
(d) Draw pairwise bivariate boxplot and bagplot
(6) For the following null and alternative hypothesis interpret the type 1 and type 2 error
(a) Ho: The person is healthy H1: The person has Alzheimer’s
(b) Ho: Message is real H1: Message is spam