Business Analytics Rstudio assignment
SUMMARY OF BUSINESS ANALYTICS COURSE PRESENTATION
Dr. HEMANG SUBRAMANIAN
Contents
Regression Equation
Panel regressions
Logit regressions
Probit regressions
Multinomial logit regressions – Choice theory
Unsupervised Machine Learning - kmeans clustering
Unsupervised Machine Learning – hierarchical clustering
Supervised Machine Learning – bayesian classifiers
Regression Equation
The linear equation is specified as follows:
Y = bX + a
Where Y = dependent variable X = independent variable a = constant (value of Y when X = 0)
b = is the slope of the regression line
( What can you deduce with slope?)
a= known as the Y-intercept
Regressions representation
The standard regression equation is the same as the linear equation with one exception: the error term.
Y = α + βX + ε
Where Y = dependent variables (Matrix)
α = constant term
β = slope or regression coefficient
X = independent variable (Matrix)
ε = error term
What is a panel
Repeated observations in time interval ‘t’ about an individual `i’ is commonly referred to as a panel.
Eg.
Obsid = 1, dvar1_1, time 1, ivar1_1_2, ivar2_1_1,…..
Obsid = 1, dvar2_1, time 2, ivar1_2_2, ivar2_2_2,…..
Obsid = 2, dvar2_2, time 1, ivar2_2, ivar2_1_1,….
The Linear Panel Model
The basic linear panel model used in analytics can be described through restrictions of the following general model namely:
What is
What is ?
What is ?
Standard Linear Model
Ofcourse is not estimable with N = n X T data points.
Therefore
Here and are the coefficients that are the means and don’t vary by time.
To model individual heterogeneity the error term is split into two: one signifying the time invariant model and the other one as the mean
Introduction
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).
Dichotomous means
True / False
1/0
Yes/No
Male/Female
College degree/No college degree
Relapse of disease/ No relapse of disease
Success/Failure
Public company / Private company
Profitable / Not Profitable
Why do it?
Generalized Linear Model
g(E(y)) = α + βx1 + γx2
G() is the link function,
E(y) is the mean – expectation of the target variable
α + βx1 + γx2 - predictors (or regressors and their coefficients)
Predict α, β and γ.
Example of using Logistic regression
Say you have 1000 customers. Predict the probability of whether a customer will buy a particular magazine or not.
g(y) = βo + β(Age)
Simple linear regression model with Age as a independent variable.
We only want the probability of the outcome dependent variable.
problem is probability is always positive. So we bring in the ‘e’ variable
p = exp(βo + β(Age)) = e^(βo + β(Age))
Probit regression.
In probability theory and statistics, the probit function is the inverse cumulative distribution function (CDF), or quantile function associated with the standard normal distribution.
It has applications in exploratory statistical graphics and specialized regression modeling of binary response variables
Prob(Y=1) =
Choice theory
A biologist may be interested in food choices that alligators make. Adult alligators might have different preferences from young ones. The outcome variable here will be the types of food, and the predictor variables might be size of the alligators and other environmental variables.
Entering high school students make program choices among general program, vocational program and academic program. Their choice might be modeled using their writing score and their social economic status.
Multinomial logistic Model is as follows.
CHOICE RESULTS
Multinomial Choice Logit models model relative choice probabilities of a choice from a given choice set and a set of alternative variables.
The Multinomial Choice Logit models provide a different mechanism to understand choice selections.
Why group? Or cluster data?
How do we determine the groups of customers so as to apply marketing strategies to these groups?
How do we alter the behaviors of this group of millennials?
How can we determine the types of criminal or felony behavaiors that exist in the world?
How can we classify a large set of birds into their species?
Clustering Techniques
Clustering (contd)
Different methods for clustering
Regular
K-means clustering
K-medians clustering
Hierarchical
Agglomorative clustering
Divisive clustering
Hierarchical clustering
Hierarchical clustering is an alternative approach to k-means clustering for identifying groups in the dataset.
It does not require us to pre-specify the number of clusters to be generated as is required by the k-means approach.
Furthermore, hierarchical clustering results in an attractive tree-based representation of the observations, called a dendrogram.
Thumb Rule
AGNES clustering is good for small clusters.
DIANA clustering is good for large clusters
Measuring dissimilarity between clusters
There are 5 methods to measure dissimilarity between clusters
Complete (Maximum) clustering
Single Linkage (Minimum) clustering
Average Linkage (Average) clustering
Centroid Linkage
Ward’s minimum variance method
Bayes theorem
Given a hypothesis and evidence , Bayes' theorem states that the relationship between the probability of the hypothesis before getting the evidence P(H) and the probability of the hypothesis after getting the evidence P(H/E) is P(H/E) = [P(E/H)/P(E)] x P(H)
This is very important because there is an updation in probability, each time there is new evidence or in this case new features.
The P(E/H)/P(E) is the factor of updation for P(H) each time new results show up. This is also known as the likelihood ratio.
Naïve Bayesian Classifier
Setup:
Data D = x(1),y(1), x(2), y(2)………..x(n), y(n)
x(i) belongs to R(d)
y(i) belongs to some finite set Y
Assumptions for a probabilistic model:
1. A family of distributions parametrized by some Pθ such that pθ (X , Y) = P θ(x/y)P θ(y) = p θ(x1/y)…….p θ(xd/y)p θ(y)
Naïve Bayesian Classifier
Here x1, y1, ……xn,yn belongs to pθ such that x1…..xd are independently and identically distributed
Problem: For a new ‘x’ predict its ‘y’
Algorithm:
Estimate the θ from D
Compute
ŷ = Argmaxy [p θ (y/x)] = p θ (x/y) p θ (y)/ p θ (x) = p θ (y/x) p θ (y)