Research Proposal
1
MARK977: Research for Marketing Decisions
Dr. Thomas Lee Trimester 1, 2018
2
Data Analysis: Data Preparation and Basic Concepts
WEEK 10 READINGS: CHAPTERS 12 AND 13
2
Recap: Hypothesis testing techniques
MARK977 Research for Marketing Decisions T1 20183
Hypothesis Testing
Test of association
Chi-square test Correlation
Regression
Test of differences
One-sample t-test
Independent- samples t-test
Paired-samples t-test
One-way ANOVA
Recap: Probability values in hypothesis testing
• p-value is the largest level of significance (typically 0.05 or 5%) at which we would not reject H0 (i.e., making Type I error)
• In general, the smaller the p-value, the greater the confidence in sample findings
– E.g., p-value of 0.78 vs. 0.02
• If the computed probability estimate or p-value is smaller than the significance level (usually 5% or 0.05), reject H0. If the probability estimate or p-value is larger than the significance level, do not reject H0.
MARK977 Research for Marketing Decisions T1 20184
3
Definitions
Correlation analysis • Assesses the strength of the
relationship between two variables
Correlation coefficient • Provides a measure of the degree
to which there is an association between two variables (X and Y)
MARK977 Research for Marketing Decisions T1 20185
Correlation analysis
Pearson correlation coefficient • Measures the degree to which there is a straight-line or linear
association between two interval-scaled (or ratio-scaled) variables, say X and Y.
• Originally proposed by Karl Pearson, it is also known as the Pearson correlation coefficient. It is also referred to as simple correlation, bivariate correlation, correlation coefficient or merely correlation.
• A positive correlation reflects a tendency for a high value in one variable to be associated with a high value in the second
• A negative correlation reflects an association between a high value in one variable and a low value in the second variable
MARK977 Research for Marketing Decisions T1 20186
4
MARK977 Research for Marketing Decisions T1 20187
Scatter plots
MARK977 Research for Marketing Decisions T1 20188
5
Scatter plots (cont.)
MARK977 Research for Marketing Decisions T1 20189
Testing significance of correlation coefficient
• Null hypothesis: Ho : r = 0 Ho : no relationship
• Alternative hypothesis: Ha : r ≠ 0 Ha : relationship exists
• If the correlation coefficient is statistically significant (i.e., p < .05), the null hypothesis should be rejected and we assess the alternative hypo.
• If it is not significant, then it has no meaning.
MARK977 Research for Marketing Decisions T1 201810
6
Correlation example
• Suppose a researcher wants to assess the relationship between attitudes toward motorcycles (attitude) and the number of years the respondent has owned a motorcycle (duration).
• Attitude – 1=do not like motorcycles – 11=very much like motorcycles
• Duration – (Raw) Number of years respondent has owned one or
more motorcycles
MARK977 Research for Marketing Decisions T1 201811
Correlation example (cont.)
MARK977 Research for Marketing Decisions T1 201812
7
Correlation example (cont.)
• Given p < .05, H0 should be rejected. • The value of r, 0.936, is close to 1.0, which means the number of years a
respondent has owned a motorcycle is strongly associated with the respondent’s attitude towards motorcycles.
• Positive sign of r implies a positive relationship – the longer the duration of motorcycle ownership, the more favourable the attitude, and vice versa.
MARK977 Research for Marketing Decisions T1 201813
Correlation: Key considerations
• Even though the correlation coefficient (r) provides a measure of association between two variables, it does not imply any causal relationship between the variables.
• It can only measure the strength of association (or covariation) between variables – it cannot imply causation.
• It does not tell us anything about the nature of the relationship – hence, regression analysis.
MARK977 Research for Marketing Decisions T1 201814
8
Regression analysis
• Statistical technique that is used to relate two or more variables
• Objective is to build a regression model or a prediction equation relating the dependent variable to one or more independent variables
• The model can then be used to describe, predict, and control the variable of interest on the basis of the independent variables
MARK977 Research for Marketing Decisions T1 201815
Types of regression analysis
Bivariate regression analysis – Regression analysis that involves a single metric dependent variable and a single metric independent variable. E.g., • DV = market share • IV = size of sales force Multiple regression analysis - Regression analysis that involves more than one independent variable. E.g., • DV = market share • IVs = size of sales force, advertising expenditure and sales
promotion budget
MARK977 Research for Marketing Decisions T1 201816
9
Regression analysis Simple Linear Regression Model
Yi = βo + β1xi + εi Where • Y = Dependent variable (e.g., no. of people entering store) • X =Independent variable (e.g., advertising expenditure the day before) • β0 = Model parameter that represents mean value of dependent variable (Y)
when the independent variable (X) is zero (i.e., intercept) • β1 = Model parameter that represents the slope that measures change in
mean value of dependent variable associated with a one-unit increase in the independent variable (e.g., slope – negative vs. positive, steep vs. flat)
• εi = Error term that describes the effects on Yi of all factors other than value of Xi (e.g., variables other than advertising expenditures – store location, weather, etc.)
MARK977 Research for Marketing Decisions T1 201817
Multiple regression
• A linear combination of predictor factors is used to predict the outcome or response factors
• The general form of the multiple regression model is explained as:
MARK977 Research for Marketing Decisions T1 201818
where β1 , β2, . . . , βk are regression coefficients associated with the independent variables X1, X2, . . . , Xk and ε is the error or residual.
10
Bivariate vs. multiple regression
MARK977 Research for Marketing Decisions T1 201819
Size of sales force
Market share
Market share
Size of sales force
Advertising expenditure
Promotion budget
Predicting the DV – R square
• The measure of the regression model’s ability to predict is called the coefficient of determination (r2) – percentage of explained variation – E.g., if r2 = .74, it means 74% of the total variation of Y (DV) is
explained or accounted for by X (IV) – Hence, r2 values range from 0 to 1 – The larger the r2 value, the more variation in the DV is being
explained by the IV • Adjusted R Square – adjusted for the number of independent
variables and sample size • F test – used to test the null hypothesis that the coefficient of
determination in the population is zero. If rejected, one or more partial regression coefficients have a value different from zero.
MARK977 Research for Marketing Decisions T1 201820
11
Testing the significance of IVs
Null Hypothesis • There is no linear relationship between the independent &
dependent variables • E.g., there is no linear relationship between advertising expenditure
(IV) and sales (DV)
Alternative Hypothesis • There is a linear relationship between the independent & dependent
variables • E.g., there is a linear relationship between advertising expenditure
(IV) and sales (DV)
MARK977 Research for Marketing Decisions T1 201821
H0: β1 = 0
H0: β1 ≠ 0
Evaluating the importance of IVs
• Beta coefficients (β) – represents the slope that measures change in mean value of dependent variable associated with a one-unit increase in the independent variable
• Beta coefficients can be positive or negative, and have a p-value associated with it
• If the beta coefficient is not statistically significant (p > .05), then it implies no relationship between that predictor and the DV
• Examine significance, direction and strength – E.g., if the β is .70 and statistically significant (p < .05), it means for
each unit increase in the IV, the DV will increase by .70 units • Use (standardized) beta coefficients when independent variables
are in different units of measurement. E.g., – Advertising expenditure (thousands) – Frequency/intensity of promotion (1=not frequent/intense at all,
7=very frequent/intense)
MARK977 Research for Marketing Decisions T1 201822
12
So …
Significance testing in regression involves: 1. Testing the significance of the overall regression
equation or model – reject or accept H0 of no relationship between IV(s) and DV?
2. If significant, check percentage of variation explained (R2) – but what kind of variation?
3. Then, check individual path coefficients to determine effect on DV (β) – whether negative or positive, strong or weak, etc.
MARK977 Research for Marketing Decisions T1 201823
Regression example
Suppose a researcher wishes to determine whether attitude towards motorcycles can be explained by duration of motorcycle ownership and importance attached to performance
Step 1: Identify variables of interest • DV = attitude • IVs = duration and importance
MARK977 Research for Marketing Decisions T1 201824
13
Regression example (cont.)
Step 2: Develop hypotheses • Null: there is no relationship • Alternative: there is a relationship
Step 3: Choose relevant test • Given more than one IV, multiple regression is
preferred
MARK977 Research for Marketing Decisions T1 201825
Regression example (cont.)
Step 4: Perform test
MARK977 Research for Marketing Decisions T1 201826
14
Regression example (cont.)
Step 5: Compare test statistic and critical value
MARK977 Research for Marketing Decisions T1 201827
Variables Entered/Removed a
Model Variables Entered Variables Removed Method
1 Importance,
Duration b
. Enter
a. Dependent Variable: Attitude
b. All requested variables entered.
Model Summary
Model R R Square Adjusted R Square
Std. Error of the
Estimate
1 .972 a .945 .933 .860
a. Predictors: (Constant), Importance, Duration
ANOVA a
Model Sum of Squares df Mean Square F Sig.
1 Regression 114.264 2 57.132 77.294 .000 b
Residual 6.652 9 .739
Total 120.917 11
a. Dependent Variable: Attitude
b. Predictors: (Constant), Importance, Duration
Coefficients a
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant) .337 .567 .595 .567
Duration .481 .059 .764 8.160 .000
Importance .289 .086 .314 3.353 .008
a. Dependent Variable: Attitude
Regression example (cont.)
Step 6: Determine whether to reject null and draw marketing research conclusion • There is a relationship (p < .05) • Predictors significantly explain variation in DV (R Square = 0.945;
Adjusted R Square = 0.933) – Both duration and importance contribute to explaining the
variation in attitude • Coefficients for both duration (β = .764, p < .05) and importance
(β = .314, p < .05) are positive and significant – Therefore, both duration and importance are important in
explaining attitude
MARK977 Research for Marketing Decisions T1 201828
15
How to report
Variables Model: Attitude
Duration .764***
Importance .314**
Lifestyle … …
Personality … …
R2 .945
Adjusted R2 .933
F-value 77.294***
MARK977 Research for Marketing Decisions T1 201829
Notes: NS not significant; *p < .05; **p < .01; ***p < .001; standardised regression coefficients are reported.
SPSS Exercise
Today’s activities: • Correlation • Regression
MARK977 Research for Marketing Decisions T1 201830
16
Correlation demo
• Suppose a research wants to examine the relationship between attitudes towards motorcycles (attitude) and the number of years the respondent has owned a motorcycle (duration). Answer the following questions: – What are the null and alternative hypotheses? – Are attitude and duration related positively or negatively?
• Open the “Motorcycle” dataset • Follow your tutor’s instructions • Refer to the “Correlation” guide on Moodle
MARK977 Research for Marketing Decisions T1 201831
Regression demo
• Suppose a researcher wants to know whether attitude towards motorcycles can be explained by duration of motorcycle ownership and importance attached to performance. Address the following issues: – Determine whether the IVs explain a significant variation in
the DV: • Does a relationship exist?
– How much variation in the DV can be explained? • What is the strength of the relationship? • What is the direction of the relationship?
– What are the implications for practice? • Follow your tutor’s instructions • Refer to the “Regression” guide on Moodle
MARK977 Research for Marketing Decisions T1 201832
17
Practice
• Open the “Internet Usage” dataset • Answer the following questions:
1. Find simple correlations between the following sets of variables: a. Internet usage and attitude towards the Internet b. Internet usage and attitude towards technology c. Attitude towards the Internet and attitude towards
technology 2. Run a bivariate regression, with Internet usage as the DV
and attitude towards the Internet as the IV. Interpret the results.
MARK977 Research for Marketing Decisions T1 201833
Practice (cont.)
• Open the “Internet Usage” dataset • Answer the following questions:
3. Run a bivariate regression, with Internet usage as the DV and attitude towards technology as the IV. Interpret the results.
4. Run a multiple regression, with Internet usage as the DV, and attitude towards the Internet and attitude towards technology as the IVs. Interpret the results.
MARK977 Research for Marketing Decisions T1 201834
18
Practice for Tutorial/Lab Task 3
• Re-run all the demos from Week 8 to Week 10: – Variable recoding – Variable re-specification – Frequency distribution – Chi-square test – One-sample t-test – Independent-samples t-test – Paired-samples t-test – Correlation – Regression
• Attempt all the practice questions from Week 8 to Week 10 • Seek help if stuck
MARK977 Research for Marketing Decisions T1 201835