Finance reserch report based on provided data
Data Analysis – 1 Introduction
FINA305/405
1
Is economics science or art?
2
Conflicts between theory and practice
3
Conflicts b/t forecasts and realizations
4
Why analyzing data?
Researchers: To discover reliable business knowledge
Employees: To make your reports believable
Governments: To evaluate policy effects
Bottom line: To think in an innovative and interesting way
5
6
7
The key/challenge of data analysis
Causality
Pure effect
Ceteris paribus
Holding other factors constant
8
Data Analysis – 2 Fundamentals
FINA305/405
9
Agenda
Function notation
Types of variables
Types of data
Statistics
Practice in Excel
10
Function notation
Often we are interested in the relationship between two or more variables, denoted using a function:
Linear function:
Non-linear function:
11
We can consider our world as a data generating function.
Though we do not know the true function form.
We want to estimate it!
Terminologies
: dependent variable or explained variable
: independent variable or explanatory variable
: intercept term
the value of 𝑌 when
: coefficient or slope
a one-unit change of 𝑋 is associated with 𝛽 change of 𝑌
13
Example
14
Types of variable
Numerical
Discrete (binary or multinomial)
number of bedrooms in a house, number of children in a family
Continuous
salary, GDP, education, etc.
Categorical
Nominal scale:
Marriage status: not married, married, de facto, divorced, etc.
Ordinal scale: ranking
Evaluation scale: A+, A, A-, B+, …
15
Types of data
Cross-sectional data
Time-series data
Pooled cross-sectional data
Panel data
16
Cross-sectional data ()
| YEAR | CEO Name | Gender | Age | Salary | Total Compensation | Industry Name |
| 2009 | David P. Storch | MALE | 57 | 799.208 | 4955.641 | Aerospace & Defense |
| 2009 | Robert E. Switz | MALE | 62 | 695.711 | 2521.879 | Communications Equipment |
| 2009 | Gerard Arpey | MALE | 50 | 669.646 | 3765.152 | Airlines |
| 2009 | John S. Gilbertson | MALE | 66 | 706 | 879.3 | Electronic Components |
| 2009 | Donald E. Brandt | MALE | 54 | 890.568 | 3997.26 | Electric Utilities |
17
Time-series data ()
| YEAR | CEO Name | Gender | Age | Salary | Total Compensation | Industry Name |
| 2005 | David P. Storch | MALE | 53 | 716.6 | 12728.39 | Aerospace & Defense |
| 2006 | David P. Storch | MALE | 54 | 741.5 | 12855.4 | Aerospace & Defense |
| 2007 | David P. Storch | MALE | 55 | 768.248 | 8326.946 | Aerospace & Defense |
| 2008 | David P. Storch | MALE | 56 | 791.295 | 3313.996 | Aerospace & Defense |
| 2009 | David P. Storch | MALE | 57 | 799.208 | 4955.641 | Aerospace & Defense |
18
Pooled cross-sectional data ()
| YEAR | CEO Name | Gender | Age | Salary | Total Compensation | Industry Name |
| 2009 | David P. Storch | MALE | 57 | 799.208 | 4955.641 | Aerospace & Defense |
| 2009 | Robert E. Switz | MALE | 62 | 695.711 | 2521.879 | Communications Equipment |
| 2009 | Gerard Arpey | MALE | 50 | 669.646 | 3765.152 | Airlines |
| 2009 | John S. Gilbertson | MALE | 66 | 706 | 879.3 | Electronic Components |
| 2009 | Donald E. Brandt | MALE | 54 | 890.568 | 3997.26 | Electric Utilities |
| 2008 | V. James Marino | MALE | 58 | 856.25 | 3681.247 | Personal Products |
| 2008 | Stanley M. Kuriyama | MALE | 54 | 400 | 669.394 | Marine |
| 2008 | Paul J. Evanson | MALE | 66 | 1121.343 | 45342.249 | Electric Utilities |
| 2008 | David Cote | MALE | 54 | 1825.962 | 20090.174 | Aerospace & Defense |
| 2008 | David J. Aldrich | MALE | 51 | 583.404 | 1783.255 | Semiconductors |
19
Panel data ()
| YEAR | CEO Name | Gender | Age | Salary | Total Compensation | Industry Name |
| 2009 | David P. Storch | MALE | 57 | 799.208 | 4955.641 | Aerospace & Defense |
| 2009 | Robert E. Switz | MALE | 62 | 695.711 | 2521.879 | Communications Equipment |
| 2009 | Gerard Arpey | MALE | 50 | 669.646 | 3765.152 | Airlines |
| 2009 | John S. Gilbertson | MALE | 66 | 706 | 879.3 | Electronic Components |
| 2009 | Donald E. Brandt | MALE | 54 | 890.568 | 3997.26 | Electric Utilities |
| 2008 | David P. Storch | MALE | 56 | 791.295 | 3313.996 | Aerospace & Defense |
| 2008 | Robert E. Switz | MALE | 61 | 742.415 | 2768.56 | Communications Equipment |
| 2008 | Gerard Arpey | MALE | 49 | 666.348 | 4039.601 | Airlines |
| 2008 | John S. Gilbertson | MALE | 65 | 706 | 968.324 | Electronic Components |
| 2008 | Donald E. Brandt | MALE | 53 | 725 | 1730.574 | Electric Utilities |
20
Summary
Cross-sectional data:
Observations on multiple entities collected at a single point in time
Time-series data
A series of observations on one entity over successive periods of time
Pooled cross-sectional data
A combination of the above two with different entities
Panel data
A combination of the above two with the same entities
21
Notations
Subscripts (i or t) are used to denote different observations of a variable
We use subscript i for cross-sectional observations (i.e. states, individuals, etc), and t for time series observations (i.e. years, months, quarters)
Summation operator, capital sigma
22
Statistics
Mean vs median
The average value of the entire set of numbers ().
The middle value between the largest and smallest in a set of numbers.
23
Variance vs standard deviation
The spread/dispersion between numbers in a data set. Or how far each number in the set is from the mean.
24
Example
25
Theoretical distribution vs histogram
Theoretical distribution is a function showing all the possible values of the data and how often they occur.
A histogram/frequency is a graphical representation of the distribution of numerical data. It is an estimate.
26
Example
27
Correlation
Linear relationship between two variables.
28
Coefficient of correlation values
-1.0
+1.0
0
Perfect Positive Correlation
Increasing Degree of Negative Correlation
-.5
+.5
Perfect Negative Correlation
No Correlation
Increasing Degree of Positive Correlation
29
71
Coefficient of correlation plots
r = 1
r = -1
r = 0
Y
X
Y
X
Y
X
Y
X
30
72
Correlation matrix
A correlation matrix shows the correlation of pairs of variables.
Value $
Land Area
Rooms
Building Area
Value $
1
Land Area
0.00045
1
Rooms
0.60722
0.19927
1
Building Area
0.70607
-0.04193
0.86599
1
Diagonals always 1
31
Question
Correlation between education and wages is strong & positive.
Does this mean education “causes” higher wages?
Possible answers/theories
Yes.
Education improves skills, skilled workers get better paying jobs.
Not necessarily.
Individuals with high innate ability pursue more education. Innate ability (not education) causes wages to increase. Education is just a signal of ability.
Individuals in rich families get more education.
33
Practice in Excel
http:// www.rbnz.govt.nz/statistics/key-graphs/key-graph-house-price-values
Importing raw data
Summary statistics
Correlation matrix, Scatter plot
Histogram
Index and natural log transformation
å
-
å
-
å
-
-
=
2
)
X
i
(X
2
)
Y
i
(Y
)
X
i
)(X
Y
i
(Y
r
r.85