Regression Homework Assignment

profileSaraBesh
ExcelPresentation_2.pptx

Excel Analyses

Dr. Sarah M. Brown

Content Analyses

Research technique used to make replicable and valid inferences by interpreting and coding textual material (or pictures).

This qualitative research can be turned into quantitative research.

Content Analysis

Identify the text to be used

Twitter

Identify the data set to be used

Lebron James Twitter page

Collect the data as appropriate given your research objectives (Hambrick et al., 2010)

Interactivity: direct communication with fellow athletes and fans

Diversion: non-sports related information provided by athlete

Information sharing: insight into athletes teammates, or sport. (details about practice or training, etc)

Content: Links to pictures, videos or other websites.

Promotional: publicity for sponsorships, upcoming games, etc.

Fanship: athlete discuss sports other than their own teams and teammates.

Record data as required using appropriate framework-making note each time a category came up.

Analyze data

Lebron James Twitter

4

T-tests

A t-test is an inferential statistic used to determine if there is a significant difference between the means of two groups.

One sample t-test: test whether there is a statistical difference between a sample mean and a known or hypothesized population mean.

Two sample t-test: test whether the unknown sample means of two groups are equal or not.

Paired sample t-test: a pre and post test to determine whether there is a difference between two groups.

One Sample t-test

Professional sports teams use the vertical jump to gauge the strength of athletes. This test requires a person to stand flat footed and leap vertically into the air and touch the highest reachable mark. A typical NFL player can jump 30 inches. That is, the population average is 30 inches.

An athletic trainer recruited a group of 15 college athletes and had them perform the vertical jump test. Conduct a one-sample t-test to determine whether the sample of the 15 college athletes vertical jump was greater than the population average.

One sample t-test

Vertical Jump
25 30
31 30
32
27
33
34
31
30
29
27
32
28
28
27
29

One-sample t-test results

Results: While the average vertical jump of the 15 basketball players (29.53) is less than the population average of 30, there is not a statistically significant difference between the two groups p>.05. Fail to reject the null hypothesis

t-Test: Two-Sample Assuming Unequal Variances
  Vertical Jump  
Mean 29.53333333 30
Variance 6.695238095 0
Observations 15 2
Hypothesized Mean Difference 0
df 14
t Stat -0.698504804
P(T<=t) one-tail 0.248150781
t Critical one-tail 1.761310136
P(T<=t) two-tail 0.496301562
t Critical two-tail 2.144786688  

One-sample t-test

The average 40 yard dash for a wide receiver in the NFL is 4.48.

t-Test: Two-Sample Assuming Unequal Variances
  4o yeard Dash  
Mean 4.533333333 4.48
Variance 0.035238095 0
Observations 15 2
Hypothesized Mean Difference 0
df 14
t Stat 1.100368489
P(T<=t) one-tail 0.144866221
t Critical one-tail 1.761310136
P(T<=t) two-tail 0.289732442
t Critical two-tail 2.144786688  

Two Sample t-test

A researcher was interested in understanding whether HFA impacted total scoring for a basketball team. To start exploring this phenomenon, the researcher analyzed total points scored at home versus away games for a season

t-Test: Two-Sample Assuming Unequal Variances
  Away Games Home Games
Mean 74.4 77.06666667
Variance 175.1142857 102.6380952
Observations 15 15
Hypothesized Mean Difference 0
df 26
t Stat -0.619705665
P(T<=t) one-tail 0.27042247
t Critical one-tail 1.70561792
P(T<=t) two-tail 0.540844939
t Critical two-tail 2.055529439  

Two Sample t-test Example #2

An organization wanted to know whether certain groups performed better under different leadership styles. To investigate the researcher evaluated productivity through monthly sales.

t-Test: Two-Sample Assuming Unequal Variances
  Leader A Leader B
Mean 1.654545455 1.418181818
Variance 0.230727273 0.613636364
Observations 11 11
Hypothesized Mean Difference 0
df 17
t Stat 0.853124167
P(T<=t) one-tail 0.20272538
t Critical one-tail 1.739606726
P(T<=t) two-tail 0.40545076
t Critical two-tail 2.109815578  

Paired Sample t-test

A researcher wanted to know whether an athlete’s scandal impacted their brand image. The researcher conducted a study where he asked participants to score the athlete from 1-7 on likeability and then shared the scandal with the participant and asked them to rank the athletes likeability again.

t-Test: Paired Two Sample for Means
  Pre-test Post-test
Mean 6.090909091 4.272727273
Variance 1.090909091 1.818181818
Observations 11 11
Pearson Correlation 0.122644473
Hypothesized Mean Difference 0
df 10
t Stat 3.766217886
P(T<=t) one-tail 0.001842172
t Critical one-tail 1.812461123
P(T<=t) two-tail 0.003684345
t Critical two-tail 2.228138852  

Paired Sample t-test

Student athletes were provided a training on concussion awareness. Before the training they were asked to rate whether they believed it was a strong possibility they would have a concussion during their playing career. They were asked the same question after the listened to the training. Doing a paired-sample t-test was there a change in athlete perception of suffering a concussion?

t-Test: Paired Two Sample for Means
  Concussion Pre-test Concussion Post-test
Mean 3.909090909 5.909090909
Variance 4.290909091 0.890909091
Observations 11 11
Pearson Correlation -0.362669109
Hypothesized Mean Difference 0
df 10
t Stat -2.581988897
P(T<=t) one-tail 0.01366146
t Critical one-tail 1.812461123
P(T<=t) two-tail 0.02732292
t Critical two-tail 2.228138852  

Linear Regression

Used to describe the relationship between the independent variable and dependent variable.

Used to predict a certain outcome:

Does the number of hours an athlete practices improve his performance?

Does exercising 4-5x a week improve cardiovascular health?

Does winning influence ticket sales?

14

Linear Regression in Excel

RQ: Does winning football games increase fan attendance?

Hypotheses: H0: There is no relationship between winning football games and fan attendance.

Ha: There is a relationship between winning football games and fan attendance.

Linear Regression Excel

Games Won Average Attendance
12 55,000
14 58,000
10 50,000
9 48,000
10 50,000
8 45,000
12 53,000
9 48,000
12 58,000
10 55,000
8 49,000

Linear Regression in Excel

Regression Statistics
Multiple R 0.88136916
R Square 0.77681159
Adjusted R Square 0.75201288
Standard Error 2160.2469
Observations 11

Correlation coefficient—measures the strength of a linear relationship between two variables between -1 and -1

Coefficient Determination – evaluates goodness of fit. How many of the points fit on the regression line

ANOVA
  df SS MS F Significance F
Regression 1 146181818 146181818 31.32467532 0.000335626
Residual 9 42000000 4666666.67
Total 10 188181818      

P-value

17

Microsoft Office User (MOU) -

Linear Regression in Excel

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 31000 3760.22652 8.24418418 1.73944E-05 22493.77664 39506.2234 22493.7766 39506.2234
Games Won 2000 357.3441745 5.59684512 0.000335626 1191.631316 2808.36868 1191.63132 2808.36868

Y=Games Won (X) + Intercept

Y=2000 (11) + 31000 = 53,000

Let’s Practice

Regression Homework Assignment

Conventionally:

|r|>0.8 => very strong relationship

0.6 ≤ |r|    strong relationship

0.4≤ |r|     moderate relationship

0.2 ≤| r|    weak relationship

|r|           very weak relationship

https://youtu.be/G8j8KAJtJlw (RMSE)

Video examples

https://youtu.be/-4T85cFY5_g