hw8 and task 2

profilepbiner
EXCELSREGRESSIONANALYSISTOOL.xlsx

DATA & Analysis Instructions

EXCEL'S REGRESSION ANALYSIS TOOL
Found in the menu for Excel's Data Analysis Tool (if you installed that)
It's free and already on your computer (if you have Excel 2013 and beyond on a PC)
SAMLE DATA*
Insulin sensitivity %C20-22 fatty acids STEP 1
[mg/m2/min] polyunsaturated Open the dialog box for the Data analysis tool, click on the DATA tab and then on the icon for the Data Analysis tool all the way to the right.
Y X
250 17.9
220 18.3
145 18.3
115 18.4
230 18.4
200 20.2
330 20.3
400 21.8
370 21.9 STEP 2
260 22.1 Scroll down to Regression and hit OK
270 23.1
530 24.2
375 24.4
*Data from Motulsky textbook, p. 145
STEP 3
Enter the columns for the Y data and for the X data. Check off as many of the outputs as you like.
Remember to pick your DEPENDENT (outcome) variable as Y and your INDEPENDENT (predictor) variable as X
Otherwise the regression analysis does not make sense!
TIP: Since the output is quite comprehensive, I recommend that you select a new worksheet ply (or tab) for the results.
I selected "new worksheet ply".
Once Excel put the data in there, I renamed the new ply "Regression Results".
You can see that ply / tab below.
Click on it to get an explanation of all the result outputs.
I selected to obtain 3 types of plots: Residual Plots, Line Fit Plots, and Normal Probability Plots.
They originally appeared in the same tab.
However, in order to have more space to add explanations, I moved them to a new tab that I named "Regression Plots".

Regression Results

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.7700025428 Pearson correlation coefficient R
R Square 0.592903916 Square of R
Adjusted R Square 0.5558951811 ignore - only relevant for multiple regression analysis
Standard Error 75.8954867325 of residuals
Observations 13
Hypothesis test for regression
Excel uses a hypothesis test for the analysis of two variances (ANOVA)
Null hypothesis: regression variance ≤ residual variance (MSregression ≤ MSresiduals)
Alternative hypothesis: regression variance > residual variance
Test statistic: F F= MS regression / MS residuals (you can check this out below)
ANOVA
df SS MS F Significance F
Regression 1 92280.9337222659 92280.9337222659 16.020648028 0.0020770122 this is the p-value for the regression
Residual 11 63361.3739700418 5760.1249063674
Total 12 155642.307692308
This table is the "coefficients table" Excel does not provide the formula for the fitted stright line.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% You can either take it from a scatter plot you made (should be the same)
Intercept -486.5419945992 193.7160213841 -2.5116249607 0.0289028264 -912.9080829377 -60.1759062607 -912.9080829377 -60.1759062607 or create it here fore yourself:
X Variable 1 37.2077457475 9.2959401573 4.0025801713 0.0020770122 16.7475194122 57.6679720827 16.7475194122 57.6679720827 y = slope * x + intercept
The slope is named after the x variable (in most programs). thus: y = 37.208x - 486.5
Since I didn't check the box for a name for the x and y variables, Excel just calls it X variable 1
(Excel puts the name of the x variable here, if the name is provided in the column)
Next, Excel creates data tables for plots that analyse the residuals and the normality of the data.
For each y coordinate, Excel calculates the "residual":
The difference between that y coordinate and its predicted y value (the corresponding y value from the straight line)
See lecture 7, slide 13
To obtain the standard residual the residual is divided by the standard error
RESIDUAL OUTPUT PROBABILITY OUTPUT Excel puts all observed y values in order from smallest to largest ("ranks" them)
Excel then uses the normal distribution to calculate the precentile of each value.
Observation Predicted Y Residuals Standard Residuals Percentile Y ( the probability that values smaller than this value occur in a normally distributed data set)
1 179.4766542802 70.5233457198 0.9705350799 3.8461538462 115
2 194.3597525792 25.6402474208 0.3528584659 11.5384615385 145 Note: minitab does not rank the actual y values, but the residuals!
3 194.3597525792 -49.3597525792 -0.6792838729 19.2307692308 200 It is more useful to test whether the residuals behave like a normal distribution than the actual y values.
4 198.080527154 -83.080527154 -1.1433457281 26.9230769231 220
5 198.080527154 31.919472846 0.4392725248 34.6153846154 230
6 265.0544694994 -65.0544694994 -0.895272964 42.3076923077 250
7 268.7752440741 61.2247559259 0.842568837 50 260
8 324.5868626953 75.4131373047 1.0378278922 57.6923076923 270
9 328.3076372701 41.6923627299 0.573766037 65.3846153846 330
10 335.7491864196 -75.7491864196 -1.0424525658 73.0769230769 370
11 372.956932167 -102.956932167 -1.4168827835 80.7692307692 375
12 413.8854524892 116.1145475108 1.5979565418 88.4615384615 400
13 421.3270016387 -46.3270016387 -0.6375474643 96.1538461538 530

Plots from Regression Tool

The first plot is the Residuals vs. x values plot
Excel calls it ("name of x variable") Residual plot
This plot answers the question:
Are the errors in the y values (residuals) evenly distributed among all x values?
Or are there any sections on the line (or in the data) that show a bigger variation in y?
This would indicate that the errors are not normally distributed and not totally random.
It could be that they are influenced by the x value
Here you see that blue dots are above and below the line, no matter what the x value is.
Looking at the data in the first tab you see that x values range from ~ 18 to ~ 24.
The default plot here ranges from x=0 to x=30, so it wastes a lot of space with no data.
Tip: Double click the x axis to change its range.
The "Format Axis" menu will open to the right.
Change the minimum and maximum bound for x.
The next plot is Excel's horrible attempt to show the actual data points (blue)
together with the fitted data points (orange) from the line of best fit.
It would be much better to draw the actual line here
and omit the fitted data points.
You can do this easily yourself:
Create a scatter plot from your data and insert a trendline
(with formula and R value)
This plot tests whether the y values are normally distributed.
The data for this plot are in the table called "Probability Output"
I'm repeating the comments I made there.
Excel puts all observed y values in order from smallest to largest ("ranks" them)
Excel then uses the normal distribution to calculate the precentile of each value
( the probability that values smaller than this value occur in a normally distributed data set).
Note: minitab does not rank the actual y values, but the residuals!
It is more useful to test whether the residuals behave like a normal distribution than the actual y values.
Questions to answer with the normal probability plot:
Are y values across their range showing an approximate normal distribution?
Are there any outliers in the values?
(looks like the data point with the highest y score - 530 - may be an outlier!)

X Variable 1 Line Fit Plot

Y 17.899999999999999 18.3 18.3 18.399999999999999 18.399999999999999 20.2 20.3 21.8 21.9 22.1 23.1 24.2 24.4 250 220 145 115 230 200 330 400 370 260 270 530 375 Predicted Y 17.899999999999999 18.3 18.3 18.399999999999999 18.399999999999999 20.2 20.3 21.8 21.9 22.1 23.1 24.2 24.4 179.47665428024101 194.3597525792232 194.3597525792232 198.08052715396872 198.08052715396872 265.0544694993884 268.77524407413404 324.58686269531705 328.30763727006257 335.74918641955372 372.95693216700914 413.88545248920997 421.32700163870101

X Variable 1

Y

Normal Probability Plot

3.8461538461538463 11.538461538461538 19.23076923076923 26.923076923076923 34.615384615384613 42.307692307692307 50 57.692307692307693 65.384615384615387 73.076923076923066 80.769230769230759 88.461538461538453 96.153846153846146 115 145 200 220 230 250 260 270 330 370 375 400 530

Sample Percentile

Y

X Variable 1 Residual Plot

17.899999999999999 18.3 18.3 18.399999999999999 18.399999999999999 20.2 20.3 21.8 21.9 22.1 23.1 24.2 24.4 70.523345719758993 25.640247420776802 -49.359752579223198 -83.080527153968717 31.919472846031283 -65.054469499388404 61.224755925865963 75.413137304682948 41.692362729937429 -75.749186419553723 -102.95693216700914 116.11454751079003 -46.327001638701006

X Variable 1

Residuals