HW 3: RStudio
> library(readxl)
> caschool_4_1_ <- read_excel("C:/Users/Owner/Downloads/caschool (4) (1).xlsx")
> View(caschool_4_1_)
#Question 1: There are are 18 Variables in data set
#Question 2: Str= Student Teacher Ratio, and Testscr=Average Test Score
#Question 3: Before finding mean, first, seperate the data using the below commands:
> str<-caschool_4_1_$str
> testscr<-caschool_4_1_$testscr
#Then, find mean and variance using codes
> mean(str)
[1] 19.64043
> mean(testscr)
[1] 654.1565
> var(str)
[1] 3.578952
> var(testscr)
[1] 363.0301
#Question 4: Find correlation and covariance with codes cor,cov
> cor(str,testscr)
[1] -0.2263628
> cov(str,testscr)
[1] -8.159324
#Question 5: Please see scatter plot
#Question 6: Standard deviation and standard error are found using sd, and std.error
> sd(str)
[1] 1.891812
> sd(testscr)
[1] 19.05335
#to find standard error first install plotrix
> library("plotrix", lib.loc="~/R/win-library/3.5")
> std.error(str)
[1] 0.09231096
> std.error(testscr)
[1] 0.9297082
#Question 7: Regression of Testscr~str
Call:
lm(formula = testscr ~ str)
Residuals:
Min 1Q Median 3Q Max
-47.727 -14.251 0.483 12.822 48.540
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 698.9330 9.4675 73.825 < 2e-16 ***
str -2.2798 0.4798 -4.751 2.78e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared: 0.05124, Adjusted R-squared: 0.04897
F-statistic: 22.58 on 1 and 418 DF, p-value: 2.783e-06
#Population Model Equation= beta0+beta1(str)+U
#Population Model Equation for this example: testscr= 698.9330+(-2.2798)+U
#Regression Model Equation= beta0 -beta1(str)
#For this example Regression Equation= testscr= 698.9330-2.2798(str)
#Question 8: Str=20, predicted test scr = 698.9330-2.2798(20) = 652.937