Presentation slides
Economic Applications of
Big Data & Predictive Analytics
FINM4100
Analytics in Accounting,
Finance and Economics
Week 9
Lesson Learning Outcomes
1 Define and review ideas around micro- and
macroeconomics
2 Review the concept of correlation
3 Analyse Macroeconomic data
Why Build Models?
“Just because you
have more data
doesn’t mean that
you’re going to make
better decisions.”
Models encapsulate
patterns that exist in
data, helping us make
sense of them.Christina Zhu Assistant Professor of Accounting Wharton School of the University of Pennsylvania
SELTS
• Student feedback is usually done in week 9
• You may be asked to fill in a survey
This Photo by Unknown Author is licensed under CC BY-SA
Software for today
1. Google Colab
• Either
A. watch the teacher demonstrate analytics and accounting in python OR
B. you can run the python scripts yourself in Google Colab
• If you want to run the code provided, make sure you have access (signed in) to Google Colab https://colab.research.google.com
2. Exploratory
A. watch the teacher demonstrate analytics and accounting in Exploratory OR
B. run each step yourself online (access is explained on the next slide)
Dataset
• Data: countries of the world.csv (1970 to 2017)
• Business Problem: How do we determine factors affecting a country's GDP per capita and make a model using the data of many countries?
• We have data from 227 countries and variables (factors) such as GDP, population, literacy, crops (%), birthrate, and others.
• We will explore correlations between each factor and GDP across various countries in python
• Make charts (try multiple linear regression in Exploratory)
This Photo by Unknown Author is licensed under CC BY-SA
What is Economics?
• Economics is the study of how society allocates scarce
resources to satisfy unlimited wants
• We can consider two branches of economics:
▪ Microeconomics is the study of how single economic
units of society make economic decisions
▪ Macroeconomics is the study of how an aggregated
economy makes economic decisions
What is Economics?
Is the study of how society allocates scarce resources
to satisfy unlimited wants
Economics
Production,
distribution
and
consumption
Scarcity,
choice and
decision
making
Microeconomics
Focus:
• How individual consumers and companies make decisions
• How they respond to changes in price
• Why different goods have different prices
• How humans may trade in an optimal way
Typical topics in this area are:
• Demand and supply
• Costs of producing goods (production, revenue and costs)
• Market structure, e.g. perfect competition
This Photo by Unknown Author is licensed under CC BY-ND
Macroeconomics
Focus:
The overall economy of a region, e.g. country, using aggregated data
Typical topics in this area are:
• Economic cycles
• Economic growth
• Fiscal and monetary policy
• Unemployment rates
• Gross Domestic Product (GDP) which is a broad measure of a
country’s economic performance
T h is
P h o to
b y U
n k n o w
n A
u th
o r is
lic e n s e d u
n d e r C
C B
Y
We will be analysing GDP data today
Why is Economic Growth important?
• It is an indicator of a healthy economy
• One theory says increasing GDP leads to more employment in some sectors
• It leads to a better standard of living
• Key components of economic growth are thought to be – Natural resources
– Infrastructure
– Population/labour
– Human capital
– Technology
– Law
This Photo by Unknown Author is licensed under CC BY-SA-NC
GDP per capita 2021
How are we doing?
Activity 1: Think – pair – share
Economics
• Watch the video below which compares micro- and macro- economics
• https://www.youtube.com/watch?v=nJbWj_kHCJQ
• Form pairs
• Person 1 will explain macroeconomics to person 2, then person 2 will explain microeconomics to person 1
• Report back to class with comments and questions
Review of concepts
• Before analysing today’s data, we need to review the idea of – Covariance and correlation
– correlation heatmaps
This Photo by Unknown Author is licensed under CC BY
Two Measures of Association
▪ Covariance (is there any pattern to the way two variables
move together?)
a. Only concerned with the direction of the relationship
b. No causal effect is implied
c. Is affected by units of measurement
▪ Correlation coefficient which incorporates part of the
covariance formula (how strong is the linear relationship
between two variables?)
Correlation coefficient
Also called Standardised Covariance and is between –1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker the linear relationship
This Photo by Unknown Author
is licensed under CC BY-NC-ND
Visualising correlation coefficient
• Method 1: Correlation heatmap
This Photo by Unknown Author is licensed under CC BY-SA
Visualising correlation coefficient
Y
X
Y
X
Y
X r = -1.0 r = 0r = +0.3
Method 2: Plots of pairs of variables
Formulae for Covariance and
Correlation
Measures the relative strength of the linear relationship
between two variables
Sample covariance
and correlation coefficient
where
𝑟 = σ𝑖=1 𝑛 ሻ𝑥𝑖 − ҧ𝑥 (𝑦𝑖 − ത𝑦
σ𝑖=1 𝑛 𝑥𝑖 − ҧ𝑥
2 σ𝑖=1 𝑛 𝑦𝑖 − ത𝑦
2
COV(x, yሻ = σ𝑖=1 𝑛 ሻ𝑥𝑖 − ҧ𝑥 (𝑦𝑖 − ത𝑦
𝑛 − 1
ҧ𝑥 is the mean of the x’s ത𝑦 is the mean of the y’s
countries of the world.csv data
• In today’s data some of the variables are obvious while others are not
• It also has commas instead of dots (which we will deal with later)
• Variables – Agriculture
– Industry
– Service
• These three represent labour force by sector, so if agriculture in Liberia is 0,769. It is really 0.769 and means that 76.9% of the work force in Liberia work in the agricultural sector. Similarly for Industry and Service.
• Climate measure is a classification between 1 (drier) and 4 (milder)
Activity Open the script and run
or watch the demo
• Download the data countries of the world.csv to a directory of your choice
• Open the script below
https://colab.research.google.com/drive/15LsR6QoH858T4e2U4LHFtlzWSL EJrWMG?usp=sharing
• You will be prompted in the second block of code to choose the data file
• Click in the box and find your countries of the world.csv to be uploaded
• Run the rest of the script and analyse the output as it is generated, e.g. correlation heatmap, countries with the highest GDP, etc.
Sample Output
Sample Output
Sample Output
Data Modification
• Make a copy of the data file in your folder
• Open the data in Microsoft Excel
• We would normally use a dot to indicate accuracy to one or
more decimal places, however a comma has been used here
• Highlight the data columns with commas
• Go to the “Editing menu”
• Click on Find & Select and scroll down to “replace”
• Replace commas , for dots . (Enter symbols as below) and click
on Replace all
• Save your file
Data Modification
• Create a new column heading in column U called “GDP Low_High”
• Type =IF(I2<3000, 0,1) in cell U2 and enter
• Click on the corner of that cell (you should see a cross), hold and drag it down
the column to repeat the formula in rows down to cell U228
• You should see a zero if GDP < $3000 per capita and a one otherwise
• Save your file
Exploratory
• Access Exploratory
• Start a new project called GDP analysis
• Use Data Frames + to find and import the modified data file
• Change variable GDP Low_High from numeric to logical before clicking on save
• Select Analytics
• We are going to go through a simple guided Decision tree model then you can experiment and try to interpret your own
• Instructions for the model type and variables are on the next slide
Exploratory analytics model
• Select Decision Tree as the type
• GDP Low_High as the Target variable
• Phones, birthrate and Agriculture as the predictor variables
• Leave sample size as is an run
• You will see a tree which is to be read from the top
• We will start to interpret this (first see next slide)
Simple Decision Tree
• The model makes its own
thresholds if you don’t make
all variables binary
• Positive of each condition is
to the right and negative to
the left
• If you add the percentages
from the bottom of the tree,
they sum at each level, e.g.
• 7% + 4% make up the 11%,
• 11% + 25% make up the
36%
Simple Decision Tree
The model makes its own thresholds if you don’t make all variables binary
Positive of each condition is to the right and negative to the left
• Rule 1: “< 75 phones per 1000
persons”
• In the case “no” = “>=75 phones
per 1000 persons”
• 64% of the countries have >=75
phones per 1000 persons (dark
blue)
• This gives them a (0.92) 92%
chance of having a GDP >=$3000
per capitaOf the countries with < 75 phones
per 1000 persons (36%), only a
0.15 (15%) have a GDP >=$3000
per capita
Simple Decision Tree
• Rule 2: “Agricultural workforce >=20%”
• If we split the group with >75 phones per 1000
persons up further into those with an Agricultural
workforce >=20% or not
• We find that 59% of countries have >75 phones
per 1000 persons and an Agricultural workforce
>=20%
• This raises the chance of the country having a
GDP >=$3000 per capita to 0.96, i.e. 96%, given
the two other conditions
Simple Decision Tree • Rule 3: “Birthrate >=29 (thought to be
roughly 29 births per 1000 capita)
• 11% of countries have <75 phones per
1000 capita and a birth rate < 29 both
per 1000 capita
• These would give the countries a 43%
chance of having a GDP >=$3000 per
capita
• 4% of the countries have <75 phones
per 1000 capita and a birth rate < 29
both per 1000 capita and an Agricultural
workforce < 16%. 62% in this category
have a GDP >=$3000 per capita
If you look at the “Importance” menu (green) , the order
of importance is phones, birth rate, agriculture
Decision Tree Exploration
• Try some different combinations of predictor variables and attempt to interpret the results
• You will find that the thresholds change a lot
• Report back to class as needed
This Photo by Unknown Author is licensed under CC BY
Vis poverty with satellite data
• If time (or in your own time) look at the report at
• https://www.kaggle.com/reubencpereira/visua lizing-poverty-w-satellite-data/report
• and interact with the maps on Kaggle
• You may have to sign in