hw8 and task 2

EXCELSREGRESSIONANALYSISTOOL.xlsx

Home >Mathematics homework help >Statistics homework help >hw8 and task 2

DATA & Analysis Instructions

EXCEL'S REGRESSION ANALYSIS TOOL
Found in the menu for Excel's Data Analysis Tool (if you installed that)
It's free and already on your computer (if you have Excel 2013 and beyond on a PC)
SAMLE DATA*
Insulin sensitivity	%C20-22 fatty acids	STEP 1
[mg/m2/min]	polyunsaturated	Open the dialog box for the Data analysis tool, click on the DATA tab and then on the icon for the Data Analysis tool all the way to the right.
Y	X
250	17.9
220	18.3
145	18.3
115	18.4
230	18.4
200	20.2
330	20.3
400	21.8
370	21.9	STEP 2
260	22.1	Scroll down to Regression and hit OK
270	23.1
530	24.2
375	24.4
*Data from Motulsky textbook, p. 145
		STEP 3
		Enter the columns for the Y data and for the X data. Check off as many of the outputs as you like.
		Remember to pick your DEPENDENT (outcome) variable as Y and your INDEPENDENT (predictor) variable as X
		Otherwise the regression analysis does not make sense!
		TIP: Since the output is quite comprehensive, I recommend that you select a new worksheet ply (or tab) for the results.
		I selected "new worksheet ply".
		Once Excel put the data in there, I renamed the new ply "Regression Results".
		You can see that ply / tab below.
		Click on it to get an explanation of all the result outputs.
		I selected to obtain 3 types of plots: Residual Plots, Line Fit Plots, and Normal Probability Plots.
		They originally appeared in the same tab.
		However, in order to have more space to add explanations, I moved them to a new tab that I named "Regression Plots".

Regression Results

SUMMARY OUTPUT
Regression Statistics
Multiple R	0.7700025428		Pearson correlation coefficient R
R Square	0.592903916		Square of R
Adjusted R Square	0.5558951811		ignore - only relevant for multiple regression analysis
Standard Error	75.8954867325		of residuals
Observations	13
Hypothesis test for regression
Excel uses a hypothesis test for the analysis of two variances (ANOVA)
Null hypothesis: regression variance ≤ residual variance (MSregression ≤ MSresiduals)
Alternative hypothesis: regression variance > residual variance
Test statistic: F		F= MS regression / MS residuals				(you can check this out below)
ANOVA
	df	SS	MS	F	Significance F
Regression	1	92280.9337222659	92280.9337222659	16.020648028	0.0020770122	this is the p-value for the regression
Residual	11	63361.3739700418	5760.1249063674
Total	12	155642.307692308
This table is the "coefficients table"										Excel does not provide the formula for the fitted stright line.
	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%		You can either take it from a scatter plot you made (should be the same)
Intercept	-486.5419945992	193.7160213841	-2.5116249607	0.0289028264	-912.9080829377	-60.1759062607	-912.9080829377	-60.1759062607		or create it here fore yourself:
X Variable 1	37.2077457475	9.2959401573	4.0025801713	0.0020770122	16.7475194122	57.6679720827	16.7475194122	57.6679720827		y = slope * x + intercept
The slope is named after the x variable (in most programs).									thus:	y = 37.208x - 486.5
Since I didn't check the box for a name for the x and y variables, Excel just calls it X variable 1
(Excel puts the name of the x variable here, if the name is provided in the column)
Next, Excel creates data tables for plots that analyse the residuals and the normality of the data.
For each y coordinate, Excel calculates the "residual":
The difference between that y coordinate and its predicted y value (the corresponding y value from the straight line)
See lecture 7, slide 13
To obtain the standard residual the residual is divided by the standard error
RESIDUAL OUTPUT					PROBABILITY OUTPUT		Excel puts all observed y values in order from smallest to largest ("ranks" them)
							Excel then uses the normal distribution to calculate the precentile of each value.
Observation	Predicted Y	Residuals	Standard Residuals		Percentile	Y	( the probability that values smaller than this value occur in a normally distributed data set)
1	179.4766542802	70.5233457198	0.9705350799		3.8461538462	115
2	194.3597525792	25.6402474208	0.3528584659		11.5384615385	145	Note: minitab does not rank the actual y values, but the residuals!
3	194.3597525792	-49.3597525792	-0.6792838729		19.2307692308	200	It is more useful to test whether the residuals behave like a normal distribution than the actual y values.
4	198.080527154	-83.080527154	-1.1433457281		26.9230769231	220
5	198.080527154	31.919472846	0.4392725248		34.6153846154	230
6	265.0544694994	-65.0544694994	-0.895272964		42.3076923077	250
7	268.7752440741	61.2247559259	0.842568837		50	260
8	324.5868626953	75.4131373047	1.0378278922		57.6923076923	270
9	328.3076372701	41.6923627299	0.573766037		65.3846153846	330
10	335.7491864196	-75.7491864196	-1.0424525658		73.0769230769	370
11	372.956932167	-102.956932167	-1.4168827835		80.7692307692	375
12	413.8854524892	116.1145475108	1.5979565418		88.4615384615	400
13	421.3270016387	-46.3270016387	-0.6375474643		96.1538461538	530

Plots from Regression Tool

The first plot is the Residuals vs. x values plot
Excel calls it ("name of x variable") Residual plot
							This plot answers the question:
							Are the errors in the y values (residuals) evenly distributed among all x values?
							Or are there any sections on the line (or in the data) that show a bigger variation in y?
							This would indicate that the errors are not normally distributed and not totally random.
							It could be that they are influenced by the x value
							Here you see that blue dots are above and below the line, no matter what the x value is.
							Looking at the data in the first tab you see that x values range from ~ 18 to ~ 24.
							The default plot here ranges from x=0 to x=30, so it wastes a lot of space with no data.
							Tip: Double click the x axis to change its range.
							The "Format Axis" menu will open to the right.
							Change the minimum and maximum bound for x.
The next plot is Excel's horrible attempt to show the actual data points (blue)
together with the fitted data points (orange) from the line of best fit.
							It would be much better to draw the actual line here
							and omit the fitted data points.
							You can do this easily yourself:
							Create a scatter plot from your data and insert a trendline
							(with formula and R value)
This plot tests whether the y values are normally distributed.
							The data for this plot are in the table called "Probability Output"
							I'm repeating the comments I made there.
							Excel puts all observed y values in order from smallest to largest ("ranks" them)
							Excel then uses the normal distribution to calculate the precentile of each value
							( the probability that values smaller than this value occur in a normally distributed data set).
							Note: minitab does not rank the actual y values, but the residuals!
							It is more useful to test whether the residuals behave like a normal distribution than the actual y values.
							Questions to answer with the normal probability plot:
							Are y values across their range showing an approximate normal distribution?
							Are there any outliers in the values?
							(looks like the data point with the highest y score - 530 - may be an outlier!)

X Variable 1 Line Fit Plot

Y 17.899999999999999 18.3 18.3 18.399999999999999 18.399999999999999 20.2 20.3 21.8 21.9 22.1 23.1 24.2 24.4 250 220 145 115 230 200 330 400 370 260 270 530 375 Predicted Y 17.899999999999999 18.3 18.3 18.399999999999999 18.399999999999999 20.2 20.3 21.8 21.9 22.1 23.1 24.2 24.4 179.47665428024101 194.3597525792232 194.3597525792232 198.08052715396872 198.08052715396872 265.0544694993884 268.77524407413404 324.58686269531705 328.30763727006257 335.74918641955372 372.95693216700914 413.88545248920997 421.32700163870101

X Variable 1

Normal Probability Plot

3.8461538461538463 11.538461538461538 19.23076923076923 26.923076923076923 34.615384615384613 42.307692307692307 50 57.692307692307693 65.384615384615387 73.076923076923066 80.769230769230759 88.461538461538453 96.153846153846146 115 145 200 220 230 250 260 270 330 370 375 400 530

Sample Percentile

X Variable 1 Residual Plot

17.899999999999999 18.3 18.3 18.399999999999999 18.399999999999999 20.2 20.3 21.8 21.9 22.1 23.1 24.2 24.4 70.523345719758993 25.640247420776802 -49.359752579223198 -83.080527153968717 31.919472846031283 -65.054469499388404 61.224755925865963 75.413137304682948 41.692362729937429 -75.749186419553723 -102.95693216700914 116.11454751079003 -46.327001638701006

X Variable 1

Residuals