SAS program problem set
Econometrics 673 Fall 2017 Homework #4 Due Monday, October 16, 2017
How Far do My Customers Travel
From the class website, download the “HW4Data.zip” ZIP file. There are two files in the compressed zip file, “hw4_data_stack.sas7bdat” which is a SAS dataset, and “Location Lat Long Data.csv”. The first few rows of SAS dataset should look like the following:
There are 27 unique values for “location_id” and in total there should be 5,632,010 observations in these data. This data is transactions sales data for a retail establishment based on a loyalty card program. The variable “Customer_ID” is a unique identifier for a specific loyalty card member. The variables “latitude” and “longitude” are the physical latitude and longitude of customer’s home address; “quantity” is the quantity purchased; and “date” is the transaction date. To determine how far customers travel to reach the specific retailing location from which these data were recorded, merge the “Location Lat Long Data.csv” data into the transactions data and use the GEODIST function to provide the straight-line distance (measured in miles) between two map locations accounting for the curvature of the earth. Let “ref_latitude” and “ref_longitude” be the physical latitude and longitude for the retail location; latitude and longitude from the sales data are the customer’s home location. Then the straight- line distance traveled by a particular customer to the retail location is given by:
Miles=GEODIST(latitude, longitude, ref_latitude, ref_longitude, ‘M’); Use this formula in SAS to calculate the straight-line distances (in miles) for each transaction.
1. Using these distances, create and distance class variable for purposes of constructing frequency distributions and cumulative distributions (counts and percentages) for each location_id to illustrate the proportion of sales associated with travel distances in half mile increments up to 20 miles. For example:
a. Dist_class=0; b. If 0<miles<=0.5 then dist_class=0.5; c. If 0.5<miles<=1 then dist_class=0.1; d. Etc. (A DO loop might make this easier)
2. Aggregate the data by distance class. In conducting your analysis address the following questions: a. Does your distribution analysis represent a line-item transaction count or a sales (quantity)
count? Which do you think is more appropriate? b. In constructing your frequency distributions, do you think it is appropriate to use all the available
data, or should some transactions be excluded as “unusual” or out of the norm (keep in mind that this is transactions data for a retail establishment)?
c. Given your analysis, for each of the 27 locations, how large must the draw area be to account for 75% of sales? 90% of sales?
d. What is the average distance traveled per sale? Is this a simple average or weighted average? Which do you think is more appropriate?
e. Among the set of unique Customer_ID values, what is the shortest distance traveled; the longest; the highest total quantity sales; the average quantity sales?
Prepare a (short) summary report that includes a summary table similar to the attached and answers to the questions posed above. Clearly identify any assumptions you must make in conducting your analysis.
Customer ID Analysis
Location ID
Average Distance Traveled
Distance Category Cut-off for 75% of Sales
Distance Category Cut-off for 90% of Sales
Minimum Distance
Maximum Distance
Maximum Quantity Sales
Average Quantity Sales
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027