StataSE 17 Questions

profileEliteResearch
assignmen-t_.pdf

1

MMD010 Assignment

(November 2021)

Suppose we are interested in popularity of movies (the dependent variable, measured by an index).

Use the movie dataset (Movies_data.csv) and answer the following questions:

1. Check the distribution of Popularity. Visualise it using a histogram and density. Discuss the shape

of the distribution and compare it to Normal Distribution. Make a log transformation if needed and

explain your reasoning.

2. Summarize the following variables (Popularity, Total Cast (number of actors), Total Crew (number

of people involved in movie production) and Genre Adventure (adventure movies)) in a table where

you present Number of observations, Mean, Standard deviation, Minimum and Maximum. Interpret

the results.

3. Create a correlation matrix among the above-mentioned variables (Popularity, Total Cast, Total

Crew, and Genre Adventure) and interpret the results.

4. Construct a t-test to check whether there is a significant difference in popularity among adventure

and non-adventure movies. Explain the findings.

5. Visualise the relationship between Popularity and Total Cast using a scatter plot. Explain what you

observe. [hint: would you use Popularity or ln Popularity?]

6. Based on the hypothesis that more actors involved in movie production leads to higher popularity,

run a simple regression between Popularity and Total Cast. Export the table and interpret the results,

address the following: interpretations of the coefficient, significance of the coefficient and R squared.

[hint: would you use Popularity or ln Popularity as dependent variable? If ln Popularity used as

dependent variable, how would you interpret the coefficient?]

7. In order to get a ‘true’ effect of Total Cast on Popularity, we need to use control variables. In our

data, such a control variable can be Total Crew. Run a multiple regression by adding Total Crew as a

control variable in the simple regression in Q5. Make a comparison between simple regression and

multiple regression: how do the interpretation of coefficients and model fit change? Check whether

there is a multicollinearity issue after adding the additional variable in the regression. [hint: command

vif]. Finally, select a better model between the two regressions and explain your reasoning.

8. Popularity of the movie might depend on the type of the movie. Add Genre Adventure in the

multiple regression in Q6 and interpret the results. Is there evidence that suggests adventure movies

are more popular than non-adventure movies? Explain the findings. Another argument is that the

relationship between total cast and popularity varies depending on movie types. In order to check this

hypothesis, add an interaction term between Genre Adventure and Total Cast to the multiple

regression. Export the table and interpret the results. [hint: check whether interaction term is

significant or not and the sign of the interaction term]

9. Based on the same dataset, develop a hypothesis. Based on your hypothesis, select the dependent

variable, independent variable and control variable(s) and run a multiple regression. Then develop a

second hypothesis which addresses the potential heterogeneity that might exist in the first hypothesis

and run a second multiple regression.

2

Here is an example:

Hypothesis 1: Total Cast has a positive effect on Popularity.

Hypothesis 2: The effect of Total Cast on Popularity differs between Adventure and non-

Adventure movies.

Export the tables, interpret the results of both regressions and state whether they support your

hypotheses or not. Free to use any variables of interests.

Submission instructions:

Email to Min Zou ([email protected]) and Irakli Barbakadze ([email protected]) by

23:59 Monday the 17th of January 2022.

- A pdf that includes all tables, graphs, text; Note: You need to export your regression outputs using

outreg2 command. Screenshots from STATA are not accepted.

- A Stata code (.do file).

File name: MMD010_Assignment_2022_firstname_surname.pdf,

MMD010_Assignment_2022_firstname_surname.do.