Applied statistic assignment using rstudio turnitin

Takeshi
S2-Assignment.pdf

STAT2170 and STAT6180 Applied Statistics

Assignment Semester 2. 2020

You are required to complete this assignment using R Markdown to compile a reproducible PDF file for your submission. You only need to submit your pdf file, no need to submit your .Rmd file. If you write your assignment in any other way, a 20% penalty will apply to your submission. Some examples that will attract a 20% penalty are:

• Write your assignment using Microsoft Word and then save it as a PDF file; • Compile your assignment into a Word/html document and then save it as a PDF file; • Include any screenshot (or equivalent) in your submission; • Submit your assignment as an HTML or a Word document file.

You need to submit your assignment via the provided submission link on iLearn by the due date.

You may discuss the assignment in the early stages with your fellow students. However, the assignment submitted should be your own work.

The R Markdown ‘Cheatsheet’ from the RStudio team is given here.

In your answers to the questions below, produce the appropriate R output and explanation of the steps and results. Don’t include any more R output than necessary and include only concise explanations.

A small tutorial on R Markdown

The following are some notes to kickstart into your R Markdown journey (we discusses some of these in more details in Week 6 Part B Lecture and SGTA Week 7).

1. If you see an error message of pdflatex not found, then you are at the right place. To knit to a pdf you need to install LaTeX on your computer. This is rather big (e.g. MacTeX is approximate 3.9Gb and MiKTex 192Mb), but a recommended option. Before installing anything, make sure you have admin right to your computer before you start. If you have encountered issues with installation of LaTeX, then you could try to install via tinytex which is much more light-weight. Open R and enter the following commands:

install.packages("tinytex") tinytex::install_tinytex()

• See Week 6 Part B Lecture for some other alternatives.

• For Mac users, you may be asked to install Xcode (another rather big installation). We only need a small piece of it called the command-line tools. Run the following line: xcode-select --install in the Terminal app on your Mac to continue.

2. To communicate your assignment results to us (this is the knitting part), you need to know some markdown & LaTeX syntax. Learning Markdown syntax will help with your formatting while learning LaTeX syntax will allow you to typeset Mathematics (copying β into your .Rmd file and assuming it would work is one of the most common error) in your assignment. Here are some resources to get you started.

1

A small tutorial on R Markdown

• Markdown tutorial - 10 minutes tutorial link • Mathematics in R Markdown link • Remember Google is your best friend, and you should google whatever error messages you got.

Able to debug your own code with Google (learn how to select the right keywords to improve your searches) and by trails and errors is part of the learning process. Please give this a go before reaching out for help.

• Now create a new R Markdown document from RStudio and knit it.

3. If you are experiencing persisting or last minute (LaTeX) compiling issues, RStudio Cloud is an excellent platform. Simply uploading everything online and knit.

4. For those decided to use the RStudio Cloud platform, you will have to download the pdf file instead of printing to a pdf at the end. Printing to a pdf, unfortunately, will turn each page into an image and then the submission system will reject it.

5. It is also our recommendation to knit often so that you know which line(s) of code is(are) giving you the problem. (There is a keyboard short-cut for knitting.) This is not so dissimilar to when you work with the console that you only run a line at a time to identify the issue.

6. Another common mistake is that students use the code read.csv("dat.csv") and then assume R would be able to know (magically) that you are referring to dat.csv in a folder far far away (in Download folder probably) from your .Rmd file. At this point of the semester, you should all have your .rproj file and workspace setup already so, that everything will be run from there. Please go back to Week 1 lecture for more details.

7. If you are stuck, create a post on the iLearn forum! Also, check earlier posts before creating a new one. Most of the time, your issues have been discussed and resolved already.

2

Question 1 [40 marks]

Question 1 [40 marks]

Chief executive office (CEO) compensation varies significantly from firm to firm. For this question, you will report on a sample of firms from a survey by Forbes magazine to establish important pattern in the compensation of CEOs. The data is available in the file compensation.csv on iLearn. This data will be used to study CEO and firm characteristics to determine the important factors influencing CEO compensation.

COMP Sum of salary, bonus and other compensation, in thousands of dollars. Other compensation does not include stock gains.

AGE The CEOs age, in years. EXPER Number of years as the firm CEO. SALES Sales revenues, in millions of dollars. PROF Profit of the firm, before taxes, in millions of dollars.

a. [8 marks] Produce a matrix scatterplot of the data and describe the possible relationships between the response and predictors and relationships between the predictors themselves. Explain if the data is suitable for a multiple regression model.

b. [2 marks] Compute the correlation matrix of the dataset and reconcile the correlation matrix with the matrix scatterplot in part a) above.

c. [10 marks] Fit a multiple model using all the predictors to explain the COMP response. Conduct an F-test for the overall regression i.e. is there any relationship between the response and the predictors. Assume significance level 0.05. In your answer:

• Write down the mathematical multiple regression model for this situation, defining all appropriate parameters.

• Write down the Hypothesis for the Overall ANOVA test of multiple regression. • Produce an ANOVA table for the overall multiple regression model (One combined regression SS source

is sufficient). • Compute the F statistics for this test. • State the Null distribution. • Compute the P-value. • State your decision and conclusion.

d. [2 marks] Using the backward model selection procedure discussed in the course, find the best multiple regression model that explains the data by using COMP as the response and start with all the predictors provided.

e. [6 marks] Validate your final model and explain why it is not appropriate to use the multiple regression model to explain the COMP response.

f. [1 mark] Use inverted square root to transform the response variable COMP and the SALES variable, i.e. 1√· . You can overwrite the existing columns in your dataframe.

g. [3 marks] Re-fit the model using 1/sqrt(COMP) as the new response variable and 1/sqrt(SALES) as the transformed predictor. In your answer, use the backward selection procedure discussed in the course to find the best multiple regression model to explain 1/sqrt(COMP). You should again include all the predictors provided in your initial model.

h. [8 marks] Validate your final model with the 1/sqrt(COMP) response and 1/sqrt(SALES) variable. In particular, in your answer, explain why the regression model with 1/sqrt(COMP) response variable is superior to the model in d.

3

Question 2 [28 marks]

Question 2 [28 marks]

An experiment was conducted to determine the effect of recipe and baking temperature on breaking angle of a chocolate cake. Breaking angle is a measure of cake quality. Six different chocolate cake recipes are considered and baked at a different temperature which was randomly assigned.

Angle Angle at which the cake broke Recipe Recipe that was used for the cake Temp Temperature at which the cake was baked: Levels are 175C 185C 195C 205C 215C 225C

The data is available in the file cake.dat on iLearn.

a. [2 marks] For this study, is the design balanced or unbalanced? Explain why. b. [5 marks] Construct two different preliminary graphs that should be used to assess the relationship

between the Angle response and Recipe and Temp factors. Comment on the graphs. c. [15 marks] Analyse the data, stating null and alternative hypothesis for each test, and check assump-

tions. Hints:

• Write down full interaction model • Write down hypothesis about the interaction term • Perform Two-way analysis and show that the interaction term is insignificant • Include model diagnostics • Interpret the results • Try log transformed response variable log(Angle) and compare results • Write down hypothesis for testing the main effects • Include model diagnostics (again)

d. [5 marks] Repeat the above test analysis for the main effects only, using a square root transformation of the response variable Angle, i.e. sqrt(Angle). Please inlclude model diagnostics and conclusion.

e. [1 marks] State your conclusions about the effect of Temp and Recipe on the Angle response. These conclusions are only required to be at the qualitative level and can be based off the outcomes of the hypothesis tests in c. and the preliminary plots in b.. You do not need to statistically examine the multiple comparisons between contrasts and interactions.

4

  • A small tutorial on R Markdown
  • Question 1 [40 marks]
  • Question 2 [28 marks]