SAS program problem set

profileIbra1989
HW4.pdf

Econometrics 673 Fall 2017 Homework #4 Due Monday, October 16, 2017

How Far do My Customers Travel

From the class website, download the “HW4Data.zip” ZIP file. There are two files in the compressed zip file, “hw4_data_stack.sas7bdat” which is a SAS dataset, and “Location Lat Long Data.csv”. The first few rows of SAS dataset should look like the following:

There are 27 unique values for “location_id” and in total there should be 5,632,010 observations in these data. This data is transactions sales data for a retail establishment based on a loyalty card program. The variable “Customer_ID” is a unique identifier for a specific loyalty card member. The variables “latitude” and “longitude” are the physical latitude and longitude of customer’s home address; “quantity” is the quantity purchased; and “date” is the transaction date. To determine how far customers travel to reach the specific retailing location from which these data were recorded, merge the “Location Lat Long Data.csv” data into the transactions data and use the GEODIST function to provide the straight-line distance (measured in miles) between two map locations accounting for the curvature of the earth. Let “ref_latitude” and “ref_longitude” be the physical latitude and longitude for the retail location; latitude and longitude from the sales data are the customer’s home location. Then the straight- line distance traveled by a particular customer to the retail location is given by:

Miles=GEODIST(latitude, longitude, ref_latitude, ref_longitude, ‘M’); Use this formula in SAS to calculate the straight-line distances (in miles) for each transaction.

1. Using these distances, create and distance class variable for purposes of constructing frequency distributions and cumulative distributions (counts and percentages) for each location_id to illustrate the proportion of sales associated with travel distances in half mile increments up to 20 miles. For example:

a. Dist_class=0; b. If 0<miles<=0.5 then dist_class=0.5; c. If 0.5<miles<=1 then dist_class=0.1; d. Etc. (A DO loop might make this easier)

2. Aggregate the data by distance class. In conducting your analysis address the following questions: a. Does your distribution analysis represent a line-item transaction count or a sales (quantity)

count? Which do you think is more appropriate? b. In constructing your frequency distributions, do you think it is appropriate to use all the available

data, or should some transactions be excluded as “unusual” or out of the norm (keep in mind that this is transactions data for a retail establishment)?

c. Given your analysis, for each of the 27 locations, how large must the draw area be to account for 75% of sales? 90% of sales?

d. What is the average distance traveled per sale? Is this a simple average or weighted average? Which do you think is more appropriate?

e. Among the set of unique Customer_ID values, what is the shortest distance traveled; the longest; the highest total quantity sales; the average quantity sales?

Prepare a (short) summary report that includes a summary table similar to the attached and answers to the questions posed above. Clearly identify any assumptions you must make in conducting your analysis.

Customer ID Analysis

Location ID

Average Distance Traveled

Distance Category Cut-off for 75% of Sales

Distance Category Cut-off for 90% of Sales

Minimum Distance

Maximum Distance

Maximum Quantity Sales

Average Quantity Sales

1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027