hw8 and task 2

profilepbiner
LinearRegressionAnalysiswithMinitabFittedLinePlottool.pptx

Linear Regression Analysis with Minitab Fitted Line Plot Tool

Dr. Doerre Data Analyses and Statistical Concepts in Biotechnology FSU Math 924

A short tutorial

TOOL 1: Select Stat -> Regression -> Fitted Line Plot

Minitab offers two different tools for linear regression:

TOOL 2: Select Stat -> Regression -> Fit Regression Model

For simple (one predictor variable) linear (and quadratic or cubic non-linear) regression, the Fitted Line Plot tool is fully sufficient. Since it has a lot fewer options, it is easier to use.

The Fit Regression Model tool is mostly needed for multiple linear regression.

When to use which tool?

What’s the difference?

The Fitted Line Plot tool provides options to output all the results needed to evaluate a linear regression.

In addition, it generates the fitted line plot, with the data points, the fitted line as well as bands for the confidence interval and the prediction interval.

The Fit Regression Model tool does not display the fitted line plot.

However, it provides the p values and confidence intervals for the line coefficients, which the Fitted Line Plot doesn’t.

TIP: Look on Blackboard for the following Excel file, which compares all the output options of these tools with the Excel Regression Data Analysis tool:

Comparison of Output options Minitab Fitted Line Plot, Fit Regression Model, Excel Regression Tool.xls

Where to find the Fitted Line Plot Tool

Select Stat -> Regression -> Fitted Line Plot

Fitted Line Plot Main Menu

In addition to linear regression, it offers quadratic and cubic regression.

The Main Menu has 3 buttons for more selections:

“Graphs” “Options” and “Storage”

These will be explained in the 3 next slides.

Here is where you tell Minitab where to find your data.

Make sure you properly assign the Response (Y) and Predictor (X) variable!

Regression will turn out a very different result if you confuse the two (even though the correlation coefficient would be the same!

Graphs Sub-Menu

I recommend to get all the graphs in one file by checking “Four in one”.

Recommended options:

Display a band for the confidence interval

(band including the range of all possible straight lines within your confidence level – default 95%).

Display a band for the prediction interval:

(tell you how far off could a single point be to still fall within the statistics).

You will see how these look like in an output sample further below.

Options sub-menu –

Even though it does not say graph or plot, it only provides options for the fitted line.

Here you can select log scale axes

Here you can give your plot a title.

These two very useful options are explained below

Storage sub-menu –

It looks like this only refers to numbers that you want to store in a Minitab worksheet, as opposed to have them displayed in the Sessions window.

However, it turns out that these numbers will not be calculated or displayed, if you don’t select to store them in a worksheet.

*Slope and intercept will also be displayed in the Sessions window together with the summary results, and on the fitted line plot.

However, if you want to use them to calculate y values from x, it’s good to already have them in a worksheet (or use Excel)

I recommend getting: Residuals,

Fits (predicted Y values), Coefficients (slope and intercept)*

A. Fitted Line Plot (Line of Best Fit, Regression Line)

PROGRAM OUTPUTS

As selected in the options, the plot is shown with a 95% confidence interval band (CI) and a 95% prediction interval band (PI)

Regression Analysis: Ins. sens versus %C20-22 FA
The regression equation is
Ins. sens = - 486.5 + 37.21 %C20-22 FA
Model Summary
S R-sq R-sq(adj)
75.8955 59.29% 55.59%
Analysis of Variance
Source DF SS MS F P
Regression 1 92281 92280.9 16.02 0.002
Error 11 63361 5760.1    
Total 12 155642      

Results are shown in the Sessions Window, from where they can be copied with the copy command, send to Word command, or send to PowerPoint command

B. Regression Analysis Results

PROGRAM OUTPUTS

See Lecture # 7 for explanations of these parameters and the following posted Excel file for special abbreviations and terminology used by Minitab (or Excel for that matter):

“Comparison of Output options Minitab Fitted Line Plot, Fit Regression Model, Excel Regression Tool.xls”

C: Residuals Graphs

PROGRAM OUTPUTS

The Residuals vs Fit Plot answers the questions:  Are the errors in the y values (residuals) evenly distributed among all x values? Or are there any sections on the line (or in the data) that show a bigger variation in y?

 The latter would indicate that the errors are not normally distributed and not totally random, so they might be influenced by the x value.

Here blue dots are above and below the line, no matter what the x value is.

The Normal Probability Plot: Minitab puts all calculated residuals in order from smallest to largest ("ranks" them). Then it uses the normal distribution to calculate the percentile of each value. ( the probability that values smaller than this value occur in a normally distributed data set). Questions to answer with the normal probability plot:

Are y values across their range showing an approximate normal distribution?

Are there any outliers in the values?

Residuals vs. order of data collection:

Can tell us if something abnormal happened there.

Residuals Histogram

Frequency distribution of residuals

D: Results Minitab put directly into worksheet, in the next free columns

(due to options selected in the Storage menu)

PROGRAM OUTPUTS

We selected to store the residuals and the fits.

Minitab calls the y values from the regression line the “fits”. Others call them the predicted y values. We also selected to store the coefficients.

Y intercept

slope

The numbers for the residuals and fits help you to understand conceptually what the residuals are: The difference between the actual y value and the fitted or predicted value from the line. Try it out! You can easily create a residuals table yourself. You can also try to find these residuals in the residual graphs. They should all be there.

2001000-100-200

99

90

50

10

1

Residual

P

e

r

c

e

n

t

400350300250200

100

50

0

-50

-100

Fitted Value

R

e

s

i

d

u

a

l

12080400-40-80-120

3

2

1

0

Residual

F

r

e

q

u

e

n

c

y

13121110987654321

100

50

0

-50

-100

Observation Order

R

e

s

i

d

u

a

l

Normal Probability PlotVersus Fits

HistogramVersus Order

Residual Plots for Insulin sens.