Regression Homework Assignment
Excel Analyses
Dr. Sarah M. Brown
Content Analyses
Research technique used to make replicable and valid inferences by interpreting and coding textual material (or pictures).
This qualitative research can be turned into quantitative research.
Content Analysis
Identify the text to be used
Identify the data set to be used
Lebron James Twitter page
Collect the data as appropriate given your research objectives (Hambrick et al., 2010)
Interactivity: direct communication with fellow athletes and fans
Diversion: non-sports related information provided by athlete
Information sharing: insight into athletes teammates, or sport. (details about practice or training, etc)
Content: Links to pictures, videos or other websites.
Promotional: publicity for sponsorships, upcoming games, etc.
Fanship: athlete discuss sports other than their own teams and teammates.
Record data as required using appropriate framework-making note each time a category came up.
Analyze data
Lebron James Twitter
4
T-tests
A t-test is an inferential statistic used to determine if there is a significant difference between the means of two groups.
One sample t-test: test whether there is a statistical difference between a sample mean and a known or hypothesized population mean.
Two sample t-test: test whether the unknown sample means of two groups are equal or not.
Paired sample t-test: a pre and post test to determine whether there is a difference between two groups.
One Sample t-test
Professional sports teams use the vertical jump to gauge the strength of athletes. This test requires a person to stand flat footed and leap vertically into the air and touch the highest reachable mark. A typical NFL player can jump 30 inches. That is, the population average is 30 inches.
An athletic trainer recruited a group of 15 college athletes and had them perform the vertical jump test. Conduct a one-sample t-test to determine whether the sample of the 15 college athletes vertical jump was greater than the population average.
One sample t-test
| Vertical Jump | |
| 25 | 30 |
| 31 | 30 |
| 32 | |
| 27 | |
| 33 | |
| 34 | |
| 31 | |
| 30 | |
| 29 | |
| 27 | |
| 32 | |
| 28 | |
| 28 | |
| 27 | |
| 29 |
One-sample t-test results
Results: While the average vertical jump of the 15 basketball players (29.53) is less than the population average of 30, there is not a statistically significant difference between the two groups p>.05. Fail to reject the null hypothesis
| t-Test: Two-Sample Assuming Unequal Variances | |||
| Vertical Jump | |||
| Mean | 29.53333333 | 30 | |
| Variance | 6.695238095 | 0 | |
| Observations | 15 | 2 | |
| Hypothesized Mean Difference | 0 | ||
| df | 14 | ||
| t Stat | -0.698504804 | ||
| P(T<=t) one-tail | 0.248150781 | ||
| t Critical one-tail | 1.761310136 | ||
| P(T<=t) two-tail | 0.496301562 | ||
| t Critical two-tail | 2.144786688 | ||
One-sample t-test
The average 40 yard dash for a wide receiver in the NFL is 4.48.
| t-Test: Two-Sample Assuming Unequal Variances | ||
| 4o yeard Dash | ||
| Mean | 4.533333333 | 4.48 |
| Variance | 0.035238095 | 0 |
| Observations | 15 | 2 |
| Hypothesized Mean Difference | 0 | |
| df | 14 | |
| t Stat | 1.100368489 | |
| P(T<=t) one-tail | 0.144866221 | |
| t Critical one-tail | 1.761310136 | |
| P(T<=t) two-tail | 0.289732442 | |
| t Critical two-tail | 2.144786688 |
Two Sample t-test
A researcher was interested in understanding whether HFA impacted total scoring for a basketball team. To start exploring this phenomenon, the researcher analyzed total points scored at home versus away games for a season
| t-Test: Two-Sample Assuming Unequal Variances | |||
| Away Games | Home Games | ||
| Mean | 74.4 | 77.06666667 | |
| Variance | 175.1142857 | 102.6380952 | |
| Observations | 15 | 15 | |
| Hypothesized Mean Difference | 0 | ||
| df | 26 | ||
| t Stat | -0.619705665 | ||
| P(T<=t) one-tail | 0.27042247 | ||
| t Critical one-tail | 1.70561792 | ||
| P(T<=t) two-tail | 0.540844939 | ||
| t Critical two-tail | 2.055529439 |
Two Sample t-test Example #2
An organization wanted to know whether certain groups performed better under different leadership styles. To investigate the researcher evaluated productivity through monthly sales.
| t-Test: Two-Sample Assuming Unequal Variances | |||
| Leader A | Leader B | ||
| Mean | 1.654545455 | 1.418181818 | |
| Variance | 0.230727273 | 0.613636364 | |
| Observations | 11 | 11 | |
| Hypothesized Mean Difference | 0 | ||
| df | 17 | ||
| t Stat | 0.853124167 | ||
| P(T<=t) one-tail | 0.20272538 | ||
| t Critical one-tail | 1.739606726 | ||
| P(T<=t) two-tail | 0.40545076 | ||
| t Critical two-tail | 2.109815578 |
Paired Sample t-test
A researcher wanted to know whether an athlete’s scandal impacted their brand image. The researcher conducted a study where he asked participants to score the athlete from 1-7 on likeability and then shared the scandal with the participant and asked them to rank the athletes likeability again.
| t-Test: Paired Two Sample for Means | |||
| Pre-test | Post-test | ||
| Mean | 6.090909091 | 4.272727273 | |
| Variance | 1.090909091 | 1.818181818 | |
| Observations | 11 | 11 | |
| Pearson Correlation | 0.122644473 | ||
| Hypothesized Mean Difference | 0 | ||
| df | 10 | ||
| t Stat | 3.766217886 | ||
| P(T<=t) one-tail | 0.001842172 | ||
| t Critical one-tail | 1.812461123 | ||
| P(T<=t) two-tail | 0.003684345 | ||
| t Critical two-tail | 2.228138852 |
Paired Sample t-test
Student athletes were provided a training on concussion awareness. Before the training they were asked to rate whether they believed it was a strong possibility they would have a concussion during their playing career. They were asked the same question after the listened to the training. Doing a paired-sample t-test was there a change in athlete perception of suffering a concussion?
| t-Test: Paired Two Sample for Means | ||
| Concussion Pre-test | Concussion Post-test | |
| Mean | 3.909090909 | 5.909090909 |
| Variance | 4.290909091 | 0.890909091 |
| Observations | 11 | 11 |
| Pearson Correlation | -0.362669109 | |
| Hypothesized Mean Difference | 0 | |
| df | 10 | |
| t Stat | -2.581988897 | |
| P(T<=t) one-tail | 0.01366146 | |
| t Critical one-tail | 1.812461123 | |
| P(T<=t) two-tail | 0.02732292 | |
| t Critical two-tail | 2.228138852 |
Linear Regression
Used to describe the relationship between the independent variable and dependent variable.
Used to predict a certain outcome:
Does the number of hours an athlete practices improve his performance?
Does exercising 4-5x a week improve cardiovascular health?
Does winning influence ticket sales?
14
Linear Regression in Excel
RQ: Does winning football games increase fan attendance?
Hypotheses: H0: There is no relationship between winning football games and fan attendance.
Ha: There is a relationship between winning football games and fan attendance.
Linear Regression Excel
| Games Won | Average Attendance |
| 12 | 55,000 |
| 14 | 58,000 |
| 10 | 50,000 |
| 9 | 48,000 |
| 10 | 50,000 |
| 8 | 45,000 |
| 12 | 53,000 |
| 9 | 48,000 |
| 12 | 58,000 |
| 10 | 55,000 |
| 8 | 49,000 |
Linear Regression in Excel
| Regression Statistics | ||
| Multiple R | 0.88136916 | |
| R Square | 0.77681159 | |
| Adjusted R Square | 0.75201288 | |
| Standard Error | 2160.2469 | |
| Observations | 11 |
Correlation coefficient—measures the strength of a linear relationship between two variables between -1 and -1
Coefficient Determination – evaluates goodness of fit. How many of the points fit on the regression line
| ANOVA | |||||
| df | SS | MS | F | Significance F | |
| Regression | 1 | 146181818 | 146181818 | 31.32467532 | 0.000335626 |
| Residual | 9 | 42000000 | 4666666.67 | ||
| Total | 10 | 188181818 |
P-value
17
Microsoft Office User (MOU) -
Linear Regression in Excel
| Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
| Intercept | 31000 | 3760.22652 | 8.24418418 | 1.73944E-05 | 22493.77664 | 39506.2234 | 22493.7766 | 39506.2234 |
| Games Won | 2000 | 357.3441745 | 5.59684512 | 0.000335626 | 1191.631316 | 2808.36868 | 1191.63132 | 2808.36868 |
Y=Games Won (X) + Intercept
Y=2000 (11) + 31000 = 53,000
Let’s Practice
Regression Homework Assignment
Conventionally:
|r|>0.8 => very strong relationship
0.6 ≤ |r| strong relationship
0.4≤ |r| moderate relationship
0.2 ≤| r| weak relationship
|r| very weak relationship
https://youtu.be/G8j8KAJtJlw (RMSE)
Video examples
https://youtu.be/-4T85cFY5_g