Business Analytics Rstudio assignment

profileayoub5
SUMMARYPRESENTATION.pptx

SUMMARY OF BUSINESS ANALYTICS COURSE PRESENTATION

Dr. HEMANG SUBRAMANIAN

Contents

Regression Equation

Panel regressions

Logit regressions

Probit regressions

Multinomial logit regressions – Choice theory

Unsupervised Machine Learning - kmeans clustering

Unsupervised Machine Learning – hierarchical clustering

Supervised Machine Learning – bayesian classifiers

Regression Equation

The linear equation is specified as follows:

Y = bX + a

Where Y = dependent variable X = independent variable a = constant (value of Y when X = 0)

b = is the slope of the regression line

( What can you deduce with slope?)

a= known as the Y-intercept

Regressions representation

The standard regression equation is the same as the linear equation with one exception: the error term.

Y = α + βX + ε

Where Y = dependent variables (Matrix)

α = constant term

β = slope or regression coefficient

X = independent variable (Matrix)

ε = error term

What is a panel

Repeated observations in time interval ‘t’ about an individual `i’ is commonly referred to as a panel.

Eg.

Obsid = 1, dvar1_1, time 1, ivar1_1_2, ivar2_1_1,…..

Obsid = 1, dvar2_1, time 2, ivar1_2_2, ivar2_2_2,…..

Obsid = 2, dvar2_2, time 1, ivar2_2, ivar2_1_1,….

The Linear Panel Model

The basic linear panel model used in analytics can be described through restrictions of the following general model namely:

What is

What is ?

What is ?

Standard Linear Model

Ofcourse is not estimable with N = n X T data points.

Therefore

Here and are the coefficients that are the means and don’t vary by time.

To model individual heterogeneity the error term is split into two: one signifying the time invariant model and the other one as the mean

Introduction

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary).

Dichotomous means

True / False

1/0

Yes/No

Male/Female

College degree/No college degree

Relapse of disease/ No relapse of disease

Success/Failure

Public company / Private company

Profitable / Not Profitable

Why do it?

Generalized Linear Model

g(E(y)) = α + βx1 + γx2

G() is the link function,

E(y) is the mean – expectation of the target variable

α + βx1 + γx2 - predictors (or regressors and their coefficients)

Predict α, β and γ.

Example of using Logistic regression

Say you have 1000 customers. Predict the probability of whether a customer will buy a particular magazine or not.

g(y) = βo + β(Age)

Simple linear regression model with Age as a independent variable.

We only want the probability of the outcome dependent variable.

problem is probability is always positive. So we bring in the ‘e’ variable

p = exp(βo + β(Age)) = e^(βo + β(Age))  

Probit regression.

In probability theory and statistics, the probit function is the inverse cumulative distribution function (CDF), or quantile function associated with the standard normal distribution.

It has applications in exploratory statistical graphics and specialized regression modeling of binary response variables

Prob(Y=1) =

Choice theory

A biologist may be interested in food choices that alligators make. Adult alligators might have different preferences from young ones. The outcome variable here will be the types of food, and the predictor variables might be size of the alligators and other environmental variables.

Entering high school students make program choices among general program, vocational program and academic program. Their choice might be modeled using their writing score and their social economic status.

Multinomial logistic Model is as follows.

CHOICE RESULTS

Multinomial Choice Logit models model relative choice probabilities of a choice from a given choice set and a set of alternative variables.

The Multinomial Choice Logit models provide a different mechanism to understand choice selections.

Why group? Or cluster data?

How do we determine the groups of customers so as to apply marketing strategies to these groups?

How do we alter the behaviors of this group of millennials?

How can we determine the types of criminal or felony behavaiors that exist in the world?

How can we classify a large set of birds into their species?

Clustering Techniques

Clustering (contd)

Different methods for clustering

Regular

K-means clustering

K-medians clustering

Hierarchical

Agglomorative clustering

Divisive clustering

Hierarchical clustering

Hierarchical clustering is an alternative approach to k-means clustering for identifying groups in the dataset.

It does not require us to pre-specify the number of clusters to be generated as is required by the k-means approach.

Furthermore, hierarchical clustering results in an attractive tree-based representation of the observations, called a dendrogram.

Thumb Rule

AGNES clustering is good for small clusters.

DIANA clustering is good for large clusters

Measuring dissimilarity between clusters

There are 5 methods to measure dissimilarity between clusters

Complete (Maximum) clustering

Single Linkage (Minimum) clustering

Average Linkage (Average) clustering

Centroid Linkage

Ward’s minimum variance method

Bayes theorem

Given a hypothesis  and evidence , Bayes' theorem states that the relationship between the probability of the hypothesis before getting the evidence P(H) and the probability of the hypothesis after getting the evidence  P(H/E) is P(H/E) = [P(E/H)/P(E)] x P(H)

This is very important because there is an updation in probability, each time there is new evidence or in this case new features.

The P(E/H)/P(E) is the factor of updation for P(H) each time new results show up. This is also known as the likelihood ratio.

Naïve Bayesian Classifier

Setup:

Data D = x(1),y(1), x(2), y(2)………..x(n), y(n)

x(i) belongs to R(d)

y(i) belongs to some finite set Y

Assumptions for a probabilistic model:

1. A family of distributions parametrized by some Pθ such that pθ (X , Y) = P θ(x/y)P θ(y) = p θ(x1/y)…….p θ(xd/y)p θ(y)

Naïve Bayesian Classifier

Here x1, y1, ……xn,yn belongs to pθ such that x1…..xd are independently and identically distributed

Problem: For a new ‘x’ predict its ‘y’

Algorithm:

Estimate the θ from D

Compute

ŷ = Argmaxy [p θ (y/x)] = p θ (x/y) p θ (y)/ p θ (x) = p θ (y/x) p θ (y)