Work

FA2023BUSI650CaseStudy2-HotelBookngsProject.docx

Page 2 of 2

Case Study: Hotel Booking Cancellation

Context:

A significant number of hotel bookings are called off due to cancellations or no-shows. Typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost. This may be beneficial to hotel guests, but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.

The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.

This pattern of cancellations of bookings impacts a hotel on various fronts:

1. Loss of resources (revenue) when the hotel cannot resell the room.

2. Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.

3. Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.

4. Human resources to make arrangements for the guests.

Objective:

The objective is to build a predictive model, using historical data of bookings, so that the hotel management can predict which booking is going to be canceled in advance. This will help the hotel group in formulating profitable policies for cancellations and refunds.

Considerations:

· The hotel group wants to predict which bookings might result in cancellations.

· The predictive model can make two types of errors:

a. Flagging a booking as a potential cancellation but in actuality the booking would not be canceled (False Positive). This could result in an overbooking scenario where the hotel might not be able to provide satisfactory services to the customer (customer may have to wait or be relocated in another hotel), thus damaging brand equity.

b. Predicting that a booking will not get canceled but the booking gets canceled (False Negative). In this case the hotel will lose resources and may have to resell the room at heavily discounted rates.

Note: Since we do not know which of the above two errors is more costly for the hotel, you should try to get a model that gives a balanced performance between Precision and Recall, i.e. we want a high f1 score.

The data set “INNHotelsGroup.csv” can be downloaded from Data sets folder in CANVAS

Data Dictionary

· Booking_ID: Unique identifier of each booking

· no_of_adults: Number of adults

· no_of_children: Number of children

· no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel

· no_of_week_nights: Number of weekday nights (Monday to Friday) the guest stayed or booked to stay at the hotel

· type_of_meal_plan: Type of meal plan booked by the customer:

· Not Selected – No meal plan selected

· Meal Plan 1 – Breakfast

· Meal Plan 2 – Half board (breakfast and one other meal)

· Meal Plan 3 – Full board (breakfast, lunch, and dinner)

· required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)

· room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.

· lead_time: Number of days between the date of booking and the arrival date

· arrival_year: Year of arrival date

· arrival_month: Month of arrival date

· arrival_date: Date of the month

· market_segment_type: Market segment designation.

· repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)

· no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking

· no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking

· avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)

· no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)

· booking_status: Flag indicating if the booking was canceled or not.

Tasks and rubric:

1. Explore: 4 points

Examine the data set and carry out EDA (particularly showing how the other variables may be related to the target variable (booking_status), through barplot/lineplot/boxplot etc.). Interpret the results and derive initial insights.

2. Data preparation: 2 points

· Drop variables that are not relevant to prediction, check and treat for missing values, create dummy variables, if needed

· Separate the predictor and target variable and split the data in train and test sets, using a 70:30 split.

3. Decision Tree model: 2 points

· Build a Decision Tree model. Calculate the various evaluation metrics for train and test set. Draw up the confusion matrix for the test set. Evaluate performance of the model.

4. Tuning and improving the model: 4 points

· Tune the Decision Tree Hyperparameters, using GridSearchCV.

· Fit the tuned model with the best parameters and compare the performance of basic model and tuned model on the test set

5. Logistic Regression Model: 2 points

· Build a Logistic regression model. Calculate the various evaluation metrics for train and test set. Draw up a confusion matrix for the test set. Evaluate performance of the model

6. Optimize the LR model: 2 points

· Ascertain Optimal threshold through Precision-Recall curve

· Evaluate the performance of the model, using optimal threshold

7. Insights: 4 points

· Compare the performance of all the four models on the test set

· Choose the best model – give reasons

· Based on the best model, which are the most important predictive features?

· List out the business insights, based on your EDA and the chosen model

Guidelines for submitting:

· Annotate your Jupyter Notebook, to explain your procedures, comments and conclusions

· After completion, run the Jupyter notebook from start to finish

· Download the notebook in HTML format and upload on CANVAS in assignment space.

BUSA701Online-ExerciseonDataExploration.docx

EXERCISE ON DATA EXPLORATION IN TABLEAU PREP

Use file: “Data Exploration Exercise.tflx” posted in this week’s module. Once file is loaded, click on “clean 1” to see the data and answer following questions:

QUESTIONS

1. What is the total number of flights?

2. How many cities have the flights originated from?

3. What is the most common range of distances flown?

4. What is the most common distance flown?

5. How many flights originating from the most popular city (city where most flight landed) landed in the state of New York?

6. Of all the flights that landed in Denver, what % originated in Nashville?

7. How many flights were flown on 7/22/2016 ?

Note: Please complete the exercise in TPrep. You should not use Excel or manual calculations. Please write down your answers in a .doc file, including some explanation of how you got the answers in T Prep, and upload your submission (doc file) in the assignments area.

This homework will be graded for 5 points

FA2023BUSI650CaseStudy2-HotelBookngsProject.docx

Page 2 of 2

Case Study: Hotel Booking Cancellation

Context:

A significant number of hotel bookings are called off due to cancellations or no-shows. Typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost. This may be beneficial to hotel guests, but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.

The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.

This pattern of cancellations of bookings impacts a hotel on various fronts:

1. Loss of resources (revenue) when the hotel cannot resell the room.

2. Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.

3. Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.

4. Human resources to make arrangements for the guests.

Objective:

The objective is to build a predictive model, using historical data of bookings, so that the hotel management can predict which booking is going to be canceled in advance. This will help the hotel group in formulating profitable policies for cancellations and refunds.

Considerations:

· The hotel group wants to predict which bookings might result in cancellations.

· The predictive model can make two types of errors:

a. Flagging a booking as a potential cancellation but in actuality the booking would not be canceled (False Positive). This could result in an overbooking scenario where the hotel might not be able to provide satisfactory services to the customer (customer may have to wait or be relocated in another hotel), thus damaging brand equity.

b. Predicting that a booking will not get canceled but the booking gets canceled (False Negative). In this case the hotel will lose resources and may have to resell the room at heavily discounted rates.

Note: Since we do not know which of the above two errors is more costly for the hotel, you should try to get a model that gives a balanced performance between Precision and Recall, i.e. we want a high f1 score.

The data set “INNHotelsGroup.csv” can be downloaded from Data sets folder in CANVAS

Data Dictionary

· Booking_ID: Unique identifier of each booking

· no_of_adults: Number of adults

· no_of_children: Number of children

· no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel

· no_of_week_nights: Number of weekday nights (Monday to Friday) the guest stayed or booked to stay at the hotel

· type_of_meal_plan: Type of meal plan booked by the customer:

· Not Selected – No meal plan selected

· Meal Plan 1 – Breakfast

· Meal Plan 2 – Half board (breakfast and one other meal)

· Meal Plan 3 – Full board (breakfast, lunch, and dinner)

· required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)

· room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.

· lead_time: Number of days between the date of booking and the arrival date

· arrival_year: Year of arrival date

· arrival_month: Month of arrival date

· arrival_date: Date of the month

· market_segment_type: Market segment designation.

· repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)

· no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking

· no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking

· avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)

· no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)

· booking_status: Flag indicating if the booking was canceled or not.

Tasks and rubric:

1. Explore: 4 points

Examine the data set and carry out EDA (particularly showing how the other variables may be related to the target variable (booking_status), through barplot/lineplot/boxplot etc.). Interpret the results and derive initial insights.

2. Data preparation: 2 points

· Drop variables that are not relevant to prediction, check and treat for missing values, create dummy variables, if needed

· Separate the predictor and target variable and split the data in train and test sets, using a 70:30 split.

3. Decision Tree model: 2 points

· Build a Decision Tree model. Calculate the various evaluation metrics for train and test set. Draw up the confusion matrix for the test set. Evaluate performance of the model.

4. Tuning and improving the model: 4 points

· Tune the Decision Tree Hyperparameters, using GridSearchCV.

· Fit the tuned model with the best parameters and compare the performance of basic model and tuned model on the test set

5. Logistic Regression Model: 2 points

· Build a Logistic regression model. Calculate the various evaluation metrics for train and test set. Draw up a confusion matrix for the test set. Evaluate performance of the model

6. Optimize the LR model: 2 points

· Ascertain Optimal threshold through Precision-Recall curve

· Evaluate the performance of the model, using optimal threshold

7. Insights: 4 points

· Compare the performance of all the four models on the test set

· Choose the best model – give reasons

· Based on the best model, which are the most important predictive features?

· List out the business insights, based on your EDA and the chosen model

Guidelines for submitting:

· Annotate your Jupyter Notebook, to explain your procedures, comments and conclusions

· After completion, run the Jupyter notebook from start to finish

· Download the notebook in HTML format and upload on CANVAS in assignment space.

BUSA701Online-ExerciseonDataExploration.docx

EXERCISE ON DATA EXPLORATION IN TABLEAU PREP

Use file: “Data Exploration Exercise.tflx” posted in this week’s module. Once file is loaded, click on “clean 1” to see the data and answer following questions:

QUESTIONS

1. What is the total number of flights?

2. How many cities have the flights originated from?

3. What is the most common range of distances flown?

4. What is the most common distance flown?

5. How many flights originating from the most popular city (city where most flight landed) landed in the state of New York?

6. Of all the flights that landed in Denver, what % originated in Nashville?

7. How many flights were flown on 7/22/2016 ?

Note: Please complete the exercise in TPrep. You should not use Excel or manual calculations. Please write down your answers in a .doc file, including some explanation of how you got the answers in T Prep, and upload your submission (doc file) in the assignments area.

This homework will be graded for 5 points

FA2023BUSI650CaseStudy2-HotelBookngsProject.docx

Page 2 of 2

Case Study: Hotel Booking Cancellation

Context:

A significant number of hotel bookings are called off due to cancellations or no-shows. Typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost. This may be beneficial to hotel guests, but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.

The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.

This pattern of cancellations of bookings impacts a hotel on various fronts:

1. Loss of resources (revenue) when the hotel cannot resell the room.

2. Additional costs of distribution channels by increasing commissions or paying for publicity to help sell these rooms.

3. Lowering prices last minute, so the hotel can resell a room, resulting in reducing the profit margin.

4. Human resources to make arrangements for the guests.

Objective:

The objective is to build a predictive model, using historical data of bookings, so that the hotel management can predict which booking is going to be canceled in advance. This will help the hotel group in formulating profitable policies for cancellations and refunds.

Considerations:

· The hotel group wants to predict which bookings might result in cancellations.

· The predictive model can make two types of errors:

a. Flagging a booking as a potential cancellation but in actuality the booking would not be canceled (False Positive). This could result in an overbooking scenario where the hotel might not be able to provide satisfactory services to the customer (customer may have to wait or be relocated in another hotel), thus damaging brand equity.

b. Predicting that a booking will not get canceled but the booking gets canceled (False Negative). In this case the hotel will lose resources and may have to resell the room at heavily discounted rates.

Note: Since we do not know which of the above two errors is more costly for the hotel, you should try to get a model that gives a balanced performance between Precision and Recall, i.e. we want a high f1 score.

The data set “INNHotelsGroup.csv” can be downloaded from Data sets folder in CANVAS

Data Dictionary

· Booking_ID: Unique identifier of each booking

· no_of_adults: Number of adults

· no_of_children: Number of children

· no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel

· no_of_week_nights: Number of weekday nights (Monday to Friday) the guest stayed or booked to stay at the hotel

· type_of_meal_plan: Type of meal plan booked by the customer:

· Not Selected – No meal plan selected

· Meal Plan 1 – Breakfast

· Meal Plan 2 – Half board (breakfast and one other meal)

· Meal Plan 3 – Full board (breakfast, lunch, and dinner)

· required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)

· room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels.

· lead_time: Number of days between the date of booking and the arrival date

· arrival_year: Year of arrival date

· arrival_month: Month of arrival date

· arrival_date: Date of the month

· market_segment_type: Market segment designation.

· repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)

· no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current booking

· no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the current booking

· avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)

· no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room, etc)

· booking_status: Flag indicating if the booking was canceled or not.

Tasks and rubric:

1. Explore: 4 points

Examine the data set and carry out EDA (particularly showing how the other variables may be related to the target variable (booking_status), through barplot/lineplot/boxplot etc.). Interpret the results and derive initial insights.

2. Data preparation: 2 points

· Drop variables that are not relevant to prediction, check and treat for missing values, create dummy variables, if needed

· Separate the predictor and target variable and split the data in train and test sets, using a 70:30 split.

3. Decision Tree model: 2 points

· Build a Decision Tree model. Calculate the various evaluation metrics for train and test set. Draw up the confusion matrix for the test set. Evaluate performance of the model.

4. Tuning and improving the model: 4 points

· Tune the Decision Tree Hyperparameters, using GridSearchCV.

· Fit the tuned model with the best parameters and compare the performance of basic model and tuned model on the test set

5. Logistic Regression Model: 2 points

· Build a Logistic regression model. Calculate the various evaluation metrics for train and test set. Draw up a confusion matrix for the test set. Evaluate performance of the model

6. Optimize the LR model: 2 points

· Ascertain Optimal threshold through Precision-Recall curve

· Evaluate the performance of the model, using optimal threshold

7. Insights: 4 points

· Compare the performance of all the four models on the test set

· Choose the best model – give reasons

· Based on the best model, which are the most important predictive features?

· List out the business insights, based on your EDA and the chosen model

Guidelines for submitting:

· Annotate your Jupyter Notebook, to explain your procedures, comments and conclusions

· After completion, run the Jupyter notebook from start to finish

· Download the notebook in HTML format and upload on CANVAS in assignment space.

BUSA701Online-ExerciseonDataExploration.docx

FA2023BUSI650CaseStudy2-HotelBookngsProject.docx

BUSA701Online-ExerciseonDataExploration.docx

FA2023BUSI650CaseStudy2-HotelBookngsProject.docx

BUSA701Online-ExerciseonDataExploration.docx