queation 1,4,5,6

philxu1988

rm_dep_demands_2013v2.pdf

Home >Mathematics homework help >queation 1,4,5,6

Single Resource Revenue Management Problems with Dependent Demands c©

Guillermo Gallego

Spring 2013

Abstract

Providers of perishable capacity, such as airline seats and hotel rooms, use market segmentation tools to sell the same capacity to different customers at difference prices. They impose time-of-purchase and usage restrictions such as Saturday Night stays on low fares to increase demand while limiting bookings to prevent demand cannibalization from customers who book late and travel for business. Capacity providers may also bundle or unbundle ancillary services such as advance seat selection, meals, mileage accrual, and luggage handling to justify price differences for the same basic service. We will assume that the fare class structure (a menu of prices, restrictions and ancillary services) is given and that the capacity provider’s goal is to allocate capacity among fare classes to maximize expected revenues net of the cost of the ancillary services. We assume that customer demand is governed by a discrete choice model and depends on the products available for sale. Thus, if a customer does not find his preferred product, he may decide not to purchase or to purchase an alternative product in the offer set. The problem is to decide, at each state, which set of products to make available for sale. Since there are an exponential number of sets that can be offered, we study the structure of the optimization problem to define and characterize efficient sets with the purpose of reducing the dimensionality of the optimization problem and to obtain insights into the type of products, e.g., nested-by-fares, that may be optimal to offer. We provide a dynamic programming formulation for the revenue management problem and use fare and demand transformations to relate it to the formulation of the independent demand model. We present an upper bound on the value function throughout the use of approximate dynamic programming with affine functions. A formulation that does not allow fares to be opened once they are closed is also presented. We then turn our attention to static models, where the time element is suppressed. We present an optimal solution to the two fare problem and an efficient heuristic for the multi-fare case. Numerical examples show that the proposed heuristic works almost as well as the optimal solution for the restricted dynamic programming model where fares cannot be reopened once they are closed.

1 Introduction

Suppliers of perishable capacity often offer a menu of products that vary in terms of price and quality. If the products differ only on price we would expect most, if not all, customers to buy the lowest priced product that gives him a positive surplus and to walk away otherwise. When differences in quality are also present, customers do not always buy the lowest priced product. As an example, a customer may be willing to pay an extra $50 for a room with an ocean view or $29 for advance seat selection and priority boarding.

When the difference between products in menu is small, there is a real likelihood that a customer will buy a different product if his preferred product is not available. This is known as demand recapture. Demand that is lost because a product is not offered is called spilled demand. Under the independent demand model, all demand for an unavailable product is spilled. In contrast, under a discrete choice model, part of this demand may be recaptured by other available products. In the context of revenue management, the independent demand model results in low protection levels for higher fare classes as it ignores demand recapture. This causes higher spill rates among the higher fare classes and leads to a downward spiral in revenues as estimates of demands for high fare classes become lower over time as explained in Cooper, Homem de Mello and Kleywegt [4].

Revenue mangers have been struggling for decades with the problem of finding optimal control mechanisms for fare class structures with dependent demands. Many attempts have been made by research groups in industry to cope with this problem. At the same time aca- demic research has moved vigorously to tackle the problem of capacity allocation under choice models. A key difference between is how the notion of time is handled. Practitioners tend to prefer to model time implicitly, e.g., through static models that are extensions of Littlewood’s rule and EMSR type heuristics. However, finding the right way to extend Littlewood’s rule proved to be more difficult than anticipated. The key complication is that the marginal value of capacity is more difficult to compute due to the fact that protecting an additional unit of capacity for later sale also alters the number of potential customers (as some of their demand may be recaptured). Nevertheless, it is possible to overcome this difficulty and find static heuristic policies that are easy to implement and perform almost as well as dynamic policies.

Academic researchers tend to prefer models where time is handled explicitly. In contrast to the independent demand formulations, where static models are relatively easier to understand and implement, the opposite seems to be true for dependent demand models. Indeed, dynamic formulations of the dependent demand model are only marginally more complicated to set up. The problem is a bit more difficult because it requires specifying a choice model and identifying the collection of “efficient” sets of products that may be offered at different states of the system. Once the efficient sets are identified, either exactly or approximately, the problem can be reformulated to be essentially equivalent to that of the independent demand model.

In this chapter we will explore both formulations. We start by introducing discrete choice models as a tool to model demand dependencies. After introducing choice models and provid- ing a variety of examples, we discuss the sales and revenue rates associated with each set and give a formal definition of efficient sets. Essentially efficient sets maximize the revenue rate

among all convex combinations of sets whose sales rate is bounded by the sales rate of the efficient set. The notion of efficient sets helps simplify the the dynamic programming formu- lations so only efficient sets need to be considered. Structural results are presented that are useful both in terms of computations and in understanding dynamic policies. In particular, we relate the formulation to that of the independent demand model and to dynamic pricing with finite price menus. We then present an upper bound on revenues, and a variant of the dynamic model where fares cannot be opened once they are closed. We end the chapter with a discussion of static models that handle time implicitly. As the reader will see, these models are complicated by the fact that changing the protection level also changes the number of potential customers for higher fare classes.

2 Discrete Choice Models

We will assume that the set of potential products to be offered is N = {1, . . . ,n}. For any subset S ⊂ N, denote by πj(S) the probability that a customer will select j ∈ S with πj(S) = 0 if j ∈ S′ = {j ∈ N : j /∈ S}. Let π(S) =

∑ j∈S πj(S) denote the probability of sale when subset

S is offered. The complement π0(S) = 1−π(S) denotes the probability that a customer selects an outside alternative. An outside alternative may mean either that the customer does not purchase or that he purchases from another vendor. Let S+ = S∪{0} denote the set of offered products together with the outside alternative. Then πS+ (S) = π(S) + π0(S) = 1. Notice that this notation implies that the no-purchase alternative j = 0 is always available. For this reason, it would be more appropriate to write πj(S+) for j ∈ S+. However, we follow here the convention of writing πj(S) instead of the more cumbersome πj(S+) with the understanding that the no-purchase alternative j = 0 is always available. This choice of notation makes it easier to refer to the set S ⊂ N of offered products, but the reader needs to remember that implicitly the no-purchase alternative is always available to the customer. We will describe a few commonly used choice models, define efficient sets, and then move on to the problem of dynamic capacity allocation.

One reason to look at discrete choice models is to better understand how customers make choices. Another important reason for our purposes is to find a subset of products that maximizes the expected profit or expected revenues among all S ⊂ N.

2.1 The Independent Demand Model (IDM)

Under this model πj(S) is independent of the offer set S (containing j). Under the IDM all the demand for fare j is lost (to the no-purchase alternative) if j is removed from S. This implies that there are non-negative constants, say v0,vj,j ∈ N, such that

πj(S) = vj

v0 + V (N) j ∈ S

where V (N) = ∑ j∈N vj ( the constants can be normalized so that v0 + V (N) = 1). In most

practical situations, it is reasonable to expect that some of the demand for fare j may be

recaptured by other products in S. The IDM is pessimistic in the sense that it ignores recap- ture. This may lead to incorrect decisions in the context it is used. In Revenue Management applications it usually leads to offering too much capacity at discounted fares. In assortment planning, it may lead to offering excessive product variety.

2.2 The Basic Attraction Model and the Multinomial Logit Model

The Basic Attraction Model (BAM) is a discrete choice model where each fare j ∈ N has an attraction value vj > 0 and v0 > 0 represents the attractiveness of the no-purchase alternative. The choice model is given by

πj(S) = vj

v0 + V (S) ∀j ∈ S, (1)

where V (S) = ∑ j∈S vj. Consequently, products with higher attraction values are more likely

to be selected.

The BAM was first proposed by Luce [12] who postulated two choice axioms and demon- strated that a discrete choice model satisfies the axioms if and only if it is of the BAM form. To adequately describe the Luce Axioms we need additional notation. For any S ⊂ T we let πS(T) =

∑ j∈S πj(T) denote the probability that a customer selects a product in S when the

set T is offered. Also, πS+ (T) = πS(T) + π0(T) = 1 −πT−S(T), where set differences T −S mean T ∩S′. The Luce Axioms can be written as:

• Axiom 1: If πi({i}) ∈ (0, 1) for all i ∈ T, then for any R ⊂ S+, S ⊂ T

πR(T) = πR(S)πS+ (T).

• Axiom 2: If πi({i}) = 0 for some i ∈ T, then for any S ⊂ T such that i ∈ S

πS(T) = πS−{i}(T −{i}).

Axiom 1 implies that the probability of selecting any set R ⊂ S+, when set T is offered, is equal to the probability of selecting R when S is offered times the probability of selecting S+ when T is offered assuming that S ⊂ T. Axiom 2 implies that if alternative i has no probability of being chosen, then it can be deleted from S without affecting the choice probabilities (Luce [13]).

The celebrated multinomial logit (MNL) model, see McFadden [16], is a special case of the BAM that arises from a random utility model. Under a random utility model, each product has random utility Ui, i ∈ N+ and the probability that j ∈ S is selected is given by πj(S) = P(Uj ≥ Ui, i ∈ S+). In words, product j ∈ S is selected if it gives as much utility as any other product in S+. More specifically, McFadden models the random utility of product i, as Ui = µi + �i, where µi is the mean utility of product i and depends on the price and quality of product i. The term �i is modeled as an extreme value distribution known as a Gumbel random variable with parameter φ. The �is are assumed to be independent and

identically distributed. We can think of �i as an idiosyncratic variation on the mean utility or as errors in measuring the utility. Under these assumptions, πj(S) is of the BAM form with vi = e

φµi, i ∈ N. The parameter φ is inversely related to the variance of the �js. As φ becomes large, the variance of the �js, becomes small, and customers will gravitate to the product in S with the largest mean utility µi (what is known as the maximum utility model). On the other hand, when φ becomes small the probability of selecting any i ∈ S converges to a uniform distribution where each product is equally likely to be selected. This is because when the variance is much larger than the mean, the customer loses the ability to reliability select products with higher mean utility.

2.3 The Generalized Attraction Model

There is considerable empirical evidence that the basic attraction model (BAM) may be too optimistic in estimating demand recapture probabilities when the customer’s first choice is not part of the offer set S. The BAM assumes that even if a customer prefers j ∈ S′, he must select among k ∈ S+. This ignores the possibility that the customer may look for products j ∈ S′ elsewhere or at a later time. As an example, suppose that a customer prefers a certain wine, and the store does not have it. The customer may then either buy one of the wines in the store, go home without purchasing, or drive to another store and look for the specific wine he wants. The BAM precludes the last possibility; it implicitly assumes that the search cost for an alternative source of product j ∈ S′ is infinity, or equivalently that there is no competition.

As an illustration, suppose that the consideration set is N = {1, 2} and that v0 = v1 = v2 = 1, so πk({1, 2}) = 33.3̄% for k = 0, 1, 2. Under the BAM, eliminating choice 2 results in πk({1}) = 50% for k = 0, 1. Suppose, however, that product 2 is available across town and that the customer’s attraction for product 2 from the alternative source is w2 = 0.5 ∈ [0,v2]. Then his choice set, when product 2 is not offered, is in reality S = {1, 2′} with 2′ representing product 2 in the alternative location with shadow attraction w2. A customer who arrives to the store has attractiveness v0 = v1 = 1 and w2 = 0.5, resulting in

π0({1}) = 1.5

2.5 = 60%, π1({1}) =

2.5 = 40%.

This formulation may also help mitigate the optimism of the BAM in inter-temporal models where a choice j ∈ S′ may become available at a later time. In this case w2 is the shadow attraction of choice 2 discounted by time and the risk that it may not be available in the future.

To formally define the general attraction model (GAM), introduced by Gallego, Ratliff and Shebalov [9], we assume that in addition to the attraction values vk,k ∈ N = {1, . . . ,n}, there are shadow attraction values wk ∈ [0,vk],k ∈ N such that for any subset S ⊂ N

πj(S) = vj

v0 + W(S′) + V (S) j ∈ S, (2)

where W(R) = ∑ j∈R wj for all R ⊂ N. For the GAM,

π0(S) = v0 + W(S

′)

v0 + W(S′) + V (S)

is the probability of the no-purchase alternative. The case wk = 0,k ∈ N recovers the BAM, while the case wk = vk,k ∈ N recovers the independent demand model (IDM). As with the BAM it is possible to normalize the parameters so that v0 = 1 when v0 > 0. The parsimonious GAM (p-GAM) is given by wj = θvj ∀j ∈ N for some θ ∈ [0, 1]. The p-GAM can serve to test the competitiveness of the market, by testing the hypothesis H0 : θ = 0 or H0 : θ = 1 against obvious alternatives to determine whether one is better off deviating, respectively, from either the BAM or the IDM.

There is an alternative, perhaps simpler, way of presenting the GAM by using the following transformation: ṽ0 = v0 + w(N) and ṽk = vk −wk,k ∈ N. For S ⊂ N, let Ṽ (S) =

∑ j∈S ṽj.

With this notation the GAM becomes:

πj(S) = vj

ṽ0 + Ṽ (S) ∀j ∈ S and π0(S) = 1 −π(S). (3)

For S ⊂ T we will use the notation πS+ (T) = ∑ j∈S+ πj(T).

Gallego, Ratliff and Shebalov [9] proposed the following generalization to the Luce Axioms:

• Axiom 1’: If πi({i}) ∈ (0, 1) for all i ∈ T, then for any non-negative R ⊂ S ⊂ T

πR(T)

πR(S) = 1 −

∑ j∈T−S

(1 −θj)πj(T)

for some set of values θj ∈ [0, 1],j ∈ N.

• Axiom 2: If πi({i}) = 0 for some i ∈ T, then for any S ⊂ T such that i ∈ S

πS(T) = πS−{i}(T −{i}).

They call the set of Axioms 1’ and 2 the Generalized Luce Axioms (GLA). The special case θj = 0 for all j recovers the original Luce Axiom 1, resulting in the BAM, while the case θj = 1 for all j reduces to the IDM. Gallego et al [9] establish the following connection between the GLA and the GAM. The proof of Theorem 1 can be found in the Appendix.

Theorem 1 A Discrete Choice Model satisfies the GLA if and only if is of the GAM form.

We will now show that the GAM also arises as the limit of the nested logit (NL) model. The NL model was originally proposed in Domencich and McFadden (1975) [5], and later refined in McFadden(1978) [15], where it is shown that it belongs to the class of Generalized Extreme Value (GEV) family of models. Under the nested choice model, customers first select a nest and then an offering within the nest. The nests may correspond to product categories

and the offerings within a nest may correspond to different variants of the product category. As an example, the product categories may be the different modes of transportation (car vs. bus) and the variants may be the different alternatives for each mode of transportation (e.g., the blue and the red buses). As an alternative, a product category may consist of a single product, and the variants may be different offerings of the same product by different vendors. We are interested in the case where the random utility of the different variants of a product category are highly correlated. The NL model, allows for correlations for variants in nest i through the dissimilarity parameter γi. Indeed, if ρi is the correlation between the random utilities of nest i offerings, then γi =

√ 1 −ρi is the dissimilarity parameter for the nest. The

case γi = 1 corresponds to uncorrelated products, and to a BAM. The MNL is the special case where the idiosyncratic part of the utilities are independent and identically distributed Gumbel random variables.

Consider now a NL model where the nests corresponds to individual products offered in the market. Customers first select a product, and then one of the vendors offering the selected product. Let Ok be the set of vendors offering product k, Sl be the set of products offered by vendor l, and vkl be the attraction value of product k offered by vendor l. Under the NL model, the probability that a customer selects product i ∈ Sj is given by

πi(Sj) =

(∑ l∈Oi v

1/γi il

)γi v0 +

∑ k

(∑ l∈Ok v

1/γk kl

)γk v 1/γi ij∑

l∈Oi v 1/γi il

, (4)

where the first term is the probability that product i is selected and the second term the probability that vendor j is selected. Many authors, e.g., [Greene(1984)], use what is know as the non-normalized nested models where vijs are not raised to the power 1/γi. This is

sometimes done for convenience by simply redefining vkl ← v 1/γk kl . There is no danger in using

this transformation as long as the γs are fixed. However, the normalized model presented here is consistent with random utility models, see [Train(2002)], and we use the explicit formulation because we will be taking limits as the γs go to zero.

It is easy to see that πi(Sj) is a BAM for each j, when γi = 1 for all i. This case can be viewed as a random utility model, where an independent Gumbel random variable is associated with each product and each vendor. Consider now the case where γi ↓ 0 for all i. At γi = 0, the random utilities of the different offerings of product i are perfectly and positively correlated. This makes sense when the products are identical and price and location are the only things differentiating vendors. When these differences are captured by the deterministic part of the utility, then a single Gumbel random variable is associated with each product. In the limit, customers select among available products and then buy from the most attractive vendor. If several vendors offer the same attraction value, then customers select randomly among such vendors. The next result shows that a GAM arises for each vendor, as γi ↓ 0 for all i.

Theorem 2 The limit of (4) as γl ↓ 0 for all l is a GAM. More precisely, there are attraction values akj, ãkj ∈ [0,akj], and ã0j ≥ 0, such that

πi(Sj) = aij

ã0j + ∑ k∈Sj ãkj

∀i ∈ Sj ∀j.

This shows that if customers first select the product and then the vendor, then the NL choice model becomes a GAM for each vendor as the dissimilarity parameters are driven down to zero. The proof of this result is in the Appendix. Theorem 2 justifies using a GAM when some or all of the products offered by a vendor are also offered by other vendors, and customers have a good idea of the price and convenience of buying from different vendors. The model may also be a reasonable approximation when products in a nest are close substitutes, e.g., when the correlations are high but not equal to one. Although we have cast the justification as a model that arises from external competition, the GAM also arises as the limit of a NL model where a firm has multiple versions of products in a nest. At the limit, customers go for the product with the highest attraction within each nest. Removing the product with the highest attraction from a nest shifts part of the demand to the product with the second highest attraction , and part to the no-purchase alternative. Consequently a GAM of this form can be used in Revenue Management to model buy up behavior when there are no fences, and customers either buy the lowest available fare for each product or do not purchase.

2.4 Mixtures of BAMs

It can be shown that any discrete choice model that arrises from a random utility model can be approximated to any degree of accuracy by a mixture of BAMs, see McFadden and Train (2000) [17]. Unfortunately, as we will later see, the mixture of logits model leads to difficulties in optimization when used in RM or assortment planning.

2.5 Markov Driven Discrete Choice Models

Blanchet, Gallego and Goyal (2013), see [2], consider a Markov Driven Discrete Choice Model (MDDCM) that is characterized by the probability distribution qi = πi(N), i ∈ N+ and by a product substitution matrix Q = (qij), i,j ∈ N+. The vector q of probabilities (known as first choice probabilities) corresponding to the choice distribution that arises when all the products are available. The substitution probability qij is the probability that a customer whose first choice demand is i will substitute to product j when i is unavailable, with qii = 0 for i 6= 0. For i 6= 0, qi0 representing the probability that the customer substitutes i for the no-purchase alternative. We define q0i = 0 for all i 6= 0 and q00 = 1 as there is no substitution from the no-purchase alternative. Mathematically, we can define the substitution matrix for i,j ∈ N, i 6= j, via the equation qij = (πj(N − i) − qj)/qi, where N − i = {j ∈ N : j 6= i}. Notice that qij is the rate at which customers substitute to j when the find i unavailable. A discrete choice model arises under the Markovian assumption that if j is also unavailable, then the customer (who originally preferred i) will now substitute according to qjk for k 6= j. As we will explain later, it is relatively easy to compute the sales rate π(S) and the revenue rate r(S) associated with any subset S ⊂ N. Moreover, it is also easy to find the assortment that maximizes r(S). One advantage of the MDDCM is that it is very easy to do assortment planning and RM optimization with it. Also, the MDDCM can be used to accurately approximate the behavior of a mixture of BAMs. This approximation, together with the ease of optimization, allows for efficient heuristics for RM and assortment planning for the mixture of BAMs.

2.6 Revenues and Efficient Assortments

We assume without loss of generality that the products are labeled so that their associated fares are decreasing1 in j. More precisely, the fares are given by p1 ≥ p2 ≥ . . . ≥ pn. For any subset S ⊂ N, let r(S) =

∑ k∈S pkπk(S) denote the expected revenue when set S is offered.

We will now present a number of examples, and for each example, we will look into the collection (π(S),r(S)),S ⊂ N, representing the sales probability and the expected revenue from offering set S ⊂ N. Certain of these sets will be special and play an important role in revenue management.

Example 1. (BAM) Assume that customers have linear sensitivities to price and quality: βp = −1 and βq = 1000. Then a product with price p and quality q has mean utility µ = βpp + βqq = −p + 1000q. If φ = .01 then the attractiveness of the product is v = exp(.01(−p + 1000q)). Table 1 shows the price and quality of three different products as well as µ and v. Table 2 shows the sale probability π(S) and the expected revenue r(S) of all the eight subsets, in increasing order of π(S) assuming that v0 = 0. Some subsets in Table 2 are in bold and correspond to the efficient sets: E0 = ∅, E1 = {1} and E2 = {1, 2}. We will later give a precise definition of efficiency but roughly speaking the efficient sets can be graphically represented as those that lie in the least, increasing concave majorant of the graph (π(S),r(S)),S ⊂ N as can be seen in Figure 1, where a graph of the (π(S),r(S)) in increasing order of π(S) as well as the upper concave majorant that goes through the efficient sets.

(p1,q1) (p2,q2) (p3,q3) (1000, 1) (850, 0.9) (650, 0.5) (µ1,v1) (µ2,v2) (µ3,v3) (0, 1.0) (50, 1.65) (−150, 0.22)

Table 1: Parameters of MNL model with βp = −1, βq = 1000 and v0 = 1

S π(S) r(S) ∅ 0% $0.00 {3} 18% $118.58 {1} 50% $500.00 {1,3} 55% $515.06 {2} 62% $529.09 {2,3} 65% $538.48 {1,2} 73% $658.15 {1,2,3} 74% $657.68

Table 2: Sets, Sale Rates and Revenue Rates (Efficient Sets are in Bold)

Example 2. (GAM) We reconsider Example 1 with w1 = 0, w2 = 0.5 and w3 = 0.2. This means that there are negative externalities associated with not offering products 2 and 3. The sales rates and revenue rates are given by Table 3 and also in Figure 2. The efficient sets are now E0 = ∅ and Ei = Si = {1, . . . , i} for i = 1, 2, 3. It is interesting to contrast Tables 2

1We will use increasing and decreasing in the weak sense unless noted otherwise.

$0.00

$100.00

$200.00

$300.00

$400.00

$500.00

$600.00

$700.00

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Re ve nu

e Ra

Demand Rate

Revenue Efficient Fron<er

Figure 1: Example 1: The inefficient sets are the diamonds below the curve

and 3. For the BAM of Table 2 offering set S2 = {1, 2} results in revenue r(S2) = $658.15. However offering S2 = {1, 2} in Example 2 results in a significant reduction of revenue since more of the demand from Product 3 is lost under the GAM than under the BAM. Notice that offering set S3 = {1, 2, 3} hurts revenues, relative to offering set S2, under the BAM but helps under the GAM. This is because under the BAM the incremental revenue from adding fare 3 to set S2 is smaller than the loss from demand cannibalization. In contrast, under the GAM, the incremental revenue from adding fare 3 to set S2 is larger than the loss from demand cannibalization.

S π(S) r(S) ∅ 0% $0.00 {3} 13% $84.17 {1} 37% $370.37 {1,3} 45% $420.48 {2} 58% $491.94 {2,3} 65% $538.48 {1,2} 69% $623.95 {1,2,3} 74% $657.68

Table 3: Sets, Sale Rates and Revenue Rates (Efficient Sets are in Bold)

Figure 1 shows a graph of the (π(S),r(S)) in increasing order of π(S) as well as the efficient frontier (upper concave majorant that goes through the efficient sets).

Example 3. (Mixtures of BAMs) Consider the following mixture of BAMs with three products and two customer classes taken from Rusmevichientong, Shmoys and Topaloglu [18]. The fares are p1 = 80,p2 = 40 and p3 = 30. An arriving customer may follow one of two BAMs, each with probability 0.5. BAM-1 has attractiveness (1, 5, 20, 1), respectively, for the no-purchase alternative and for the three products. BAM-2 has attractiveness (5, 1, 50, 50), respectively, for the no-purchase alternative and for the three products. Let πi(S) and ri(S) denote, respectively, the sales probability and the expected revenue associated with BAM-i, for i = 1, 2. Given any offer set S, let π(S) = 0.5π1(S) + 0.5π2(S) be the probability of sale for the

$-‐

$100.00

$200.00

$300.00

$400.00

$500.00

$600.00

$700.00

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Re ve nu

e Ra

Demand Rate

Revenue Efficient Fron=er

Figure 2: Example 2: The inefficient sets are the diamonds below the solid curve

mixture of the two BAMs and r(S) = 0.5r1(S) + 0.5r2(S) be the expected revenue for the mixture of the two BAMs. Table 4 shows (S,π(S),r(S)) for all S ⊂{1, 2, 3}. Notice that the efficient sets are E0 = ∅, E1 = {1} and E2 = {1, 3}. Figure 3 shows a graph of the (π(S),r(S)) in increasing order of π(S) as well as the efficient frontier.

S π(S) r(S) ∅ 0% $0.00

{1} 50% $40.00 {3} 70% $21.14

{1,3} 88% $44.82 {2} 93% $37.23 {1,2} 94% $41.65 {2,3} 95% $35.53 {1,2,3} 96% $39.66

Table 4: Sets, Sale Rates and Revenue Rates for Example 3

Here is another example of a mixture of BAMs where the efficient sets are not nested.

Example 4. (Mixtures of BAMs) Consider the following mixture with four products and three customer classes. The fares are p1 = 11.50, p2 = 11.00, p3 = 10.80 and p4 = 10.75. The attractiveness of the BAMs are, respectively, (1, 5, 2, 300, 1), (1, 6, 4, 300, 1) and (1, 0, 1, 300, 7), with the first component representing the attractiveness of the no-purchase alternative. An arriving customer has 1/6 probability of belonging to market segment 1, 1/3 probability to market segment 2 and 1/2 probability of belonging to market segment 3. Table 5 lists the efficient sets and the corresponding sales and revenue rates. The efficient sets are E0 = ∅, E1 = {1}, E2 = {1, 2}, E3 = {1, 4}, E4 = {1, 2, 4} and E5 = {1, 2, 3}. Notice that E3 does not contain E2 and E5 does not contain E4. Moreover, this is an instance where the number of efficient sets m = 5 > n = 4.

$-‐ $5.00 $10.00 $15.00 $20.00 $25.00 $30.00 $35.00 $40.00 $45.00 $50.00

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Re ve nu

e Ra

Demand Rate

Revenue Efficient Fron=er

Figure 3: Example 3: The inefficient sets are the diamonds below the solid curve

S π(S) r(S) ∅ 0% $0.00

{1} 42.5% $4.88 {1,2} 69.9% $7.83 {1,4} 87.2% $9.65 {1,2,4} 89.8% $9.90 {1,2,3} 99.7% $10.77

Table 5: Sets, Sale Rates and Revenue Rates for Example 4

3 Efficient Sets and Assortment Optimization

In this section we formally define efficient sets and show that for certain choice models the efficient sets have desirable properties such as being nested or nested by fares. We also look at assortment optimization, which is essentially the problem of finding the assortment S ⊂ N with highest revenue r(S). The reader not interested in the technical details may skip this section on first reading.

For any ρ ∈ [0, 1] consider the linear program

R(ρ) = max ∑ S⊂N

r(S)t(S) (5)

subject to ∑ S⊂N

π(S)t(S) ≤ ρ ∑ S⊂N

t(S) = 1

t(S) ≥ 0 ∀S ⊂ N.

The linear program selects a convex combination of all possible actions (subsets of N) to maximize the expected revenue that can be obtained subject to the bound ρ on the probability of sale. Later we will see that ΛR(c/Λ) is an upper bound on the expected revenue when we

expect Λ customers to arrive and the capacity is c. Thus the ratio of capacity to potential demand c/Λ will play the role of ρ.

The decision variables in the linear program (5) are the proportion of time t(S) ≥ 0 that each subset S ⊂ N is offered for sale. The following results follow from the standard theory of parametric linear programming.

Proposition 1 R(ρ) is increasing, concave, and piece-wise linear.

We can trace the efficient frontier (ρ,R(ρ)), 0 ≤ ρ ≤ 1, by solving the linear program (5) parametrically (perhaps using the dual formulation and column generation), for all 0 ≤ ρ ≤ 1. In some cases the column generation step may be NP-hard, but for now we will assume that it is possible to solve the problem for all ρ ∈ [0, 1]. Sets S ⊂ N such that r(S) < R(π(S)) are said to be inefficient as the lie below the efficient frontier (ρ,R(ρ)) at ρ = π(S). Sets S such that R(π(S)) = r(S) are said to be efficient as the pair (π(S),r(S)) lies in the efficient frontier at ρ = π(S). Equivalently, a set S is efficient if t(S) = 1 is an optimal solution to the linear program for ρ = π(S), as then R(ρ) = r(S). For any choice model, let C = {E0,E1, . . . ,Em} be the collection of efficient sets. Let πj = π(Ej) and rj = r(Ej) for all j ∈ M = {0, 1, . . . ,m}. We will assume from now on that the efficient sets are sorted in increasing order of πj,j ∈ M. Let uj = (rj − rj−1)/(πj −πj−1) be the slope joining (πj−1,rj−1) and (πj,rj). Because R(ρ) is increasing concave and linear between the points (πj,rj), it follows that u1 > u2 > ... > um ≥ 0, and that

R(ρ) = rj−1 + uj(ρ−πj−1) ∀ ρ ∈ [πj−1,πj] ∀j = 1, 2, . . . ,m.

Notice also that R(ρ) = R(πm) = rm for all ρ > πm, so the definition of R(ρ) can be extended to all ρ ≥ 0.The upper concave envelopes of Figures 1, 2 and 3 are all of these form.

When Ej−1 ⊂ Ej, then uj is the marginal contribution to revenue of the fare or fares added to Ej−1 to form Ej. It is possible to show, through dual feasibility, that Ej+1 is a maximizer of the ratio (r(S) − rj)/(π(S) −πj) over all sets S such that π(S) > πj if the resulting ratio is non-negative (otherwise Ej = Em is the last efficient set).

Let us reconsider the choice models of the previous section to look more closely into the efficient sets. For Example 1, the efficient sets are E0 = ∅, E1 = {1} and E2 = {1, 2}. For Example 2, the efficient sets are E0 = ∅, E1 = {1}, E2 = {1, 2} and E3 = {1, 2, 3}, while for Example 3, the efficient sets are E0 = ∅, E1 = {1} and E2 = {1, 3}. Notice that in these examples the efficient sets are nested, and in Examples 1 and 2 the efficient sets are also nested-by-fare. In Example 3, the efficient sets are not nested-by fare since the efficient set E2 = {1, 3} skips product 2 with p2 > p3. Notice that the number of efficient sets is at most n in Examples 1,2 and 3. In contrast, the efficient sets of Example 4 are E0 = ∅, E1 = {1}, E2 = {1, 2}, E3 = {1, 4}, E4 = {1, 2, 4} and E5 = {1, 2, 3}, so the efficient sets are not nested and the number of efficient sets is greater than n.

Talluri and van Ryzin [19], who first identified efficient sets for discrete choice models, used a slightly different definition. Under their definition, a set S is inefficient, if it is possible to form a convex combination of other sets that result in a strictly higher revenue with the same sale probability or if it is possible to form a convex combination of other sets that results in

a strictly lower sale probability but the same revenue. They define efficient sets as those that are not inefficient. The two definitions are essentially equivalent, although it is possible for our definition to include sets that would be deemed inefficient by the definition in [19]. As an example, suppose that rm = rm−1 with πm > πm−1. Then (πm,rm) is in the efficient frontier, but it is deemed inefficient by [19], because the same revenue rm−1 = rm can be obtained with a lower sale probability. This is an innocuous subtlety, because in this case the efficient set Em will not be selected by an optimization algorithm unless we specify that we want to provide better customer service without sacrificing revenue.

The following result is key in simplifying the optimization problem resulting from the dynamic programs in the next section. In essence it reduces the optimization from 2n subsets S ⊂ N to just m, the number of efficient sets.

Theorem 3 For any z ≥ 0,

max S⊂N

[r(S) −zπ(S)] = max S∈C

[r(S) −zπ(S)] = max j∈M

[rj −zπj].

Proof: It is enough to show that any maximizer of [r(S)−zπ(S)] must be efficient. Suppose for a contradiction that r(T) − zπ(T) > r(S) − zπ(S) for all S 6= T ⊂ N and that T is not an efficient set. Then there exist a set of non-negative weights t(S) : S ⊂ N such that R(π(T)) =

∑ S⊂N t(S)r(S) > r(T) with

∑ S⊂N t(S) = 1 and

∑ S⊂N t(S)π(S) ≤ π(T).

Consequently,

r(T) −zπ(T) < R(π(T)) −zπ(T) ≤ ∑ S⊂N

[r(S) −zπ(S)]t(S) < r(T) −zπ(T),

where the first inequality follows from the inefficiency of T, the second from ∑ S⊂N t(S)π(S) ≤

π(T) and the third from the claimed optimality of T and the fact that t(T) < 1 as T is not efficient. Since the displayed equation is a contradiction, it must be that T is efficient, and consequently we can restrict the maximization to C resulting in maxS∈C[r(S) − zπ(S)] = maxj∈M [rj −zπj].

Theorem 3 tell us that to find the best set to offer when the marginal cost is z, we should search for an efficient set with the largest value of rj − zπj. Now rj − zπj > rj−1 − zπj−1 if and only if z < uj, where uj is the slope joining (πj−1,rj−1) and (πj,rj). From Proposition 1 the uj’s are non-negative and decreasing in j = 1, . . . ,m. To visualize this graphically, draw the efficient points (πj,rj),j ∈ M and a line with slope z ≥ 0 from the origin. An optimal solution is to select the efficient set with index

j(z) = max{j ∈ M : uj > z}.

From this it is clear that j(z) < m for all z ≥ 0 whenever rm = rm−1, because um = 0 ≤ z. This shows that it is indeed innocuous to use our more liberal definition of efficient sets, which is more consistent with the definition of efficiency used in portfolio theory in Finance. However, if z = 0, we may want to offer Em instead of Em−1 when the slope um = 0. This is because more customers are served by offering Em resulting in the same revenue as Em−1. To do this, we just modify the definition of j(z) to j(z) = max{j ∈ M : uj ≥ z}.

Consider the linear program

RC(ρ) = max ∑ j∈M

rjtj

subject to ∑ j∈M

πjtj ≤ ρ ∑ j∈M

tj = 1

tj ≥ 0 ∀j ∈ M.

Clearly RC(ρ) ≤ R(ρ) for all ρ ∈ [0, 1] as the linear program RC(ρ) is more restricted. The next result shows that in fact RC(ρ) = R(ρ) so the optimization problem can be reduces to the collection C of efficient sets.

Theorem 4 RC(ρ) = R(ρ) for all ρ ∈ [0, 1].

Proof: Consider the dual of the linear program defining R(ρ) in terms of variables z and β. The problem is to minimize ρz +β subject to π(S)z +β ≥ r(S), ∀S ⊂ N, z ≥ 0. The solution in terms of β is to set β = maxS⊂N [r(S) − zπ(S)]. By Theorem 3, β = maxj∈M [rj − zπj], so the dual problem is reduced to minimizing ρz + β subject to πjz + β ≥ rj,j ∈ M. The result follows by recognizing that this is the dual of the liner program defining RC(ρ).

3.1 Nested-by-Fare Efficient Sets

A choice model is said to have the nested-by-fare property if p1 > p2 > .. . > pn and the collection of efficient sets C is contained in the collection {S0,S1, . . . ,Sn} where S0 = ∅ and Sj = {1, . . . ,j} for j = 1, . . . ,n. The next theorem, due to Talluri and van Ryzin [19], provides necessary and sufficient conditions for the nested-by-fare property to hold. This result is important because it justifies in some important cases the optimality of the nested- by-fare structure and because it reduces the number of efficient fares to at most n. The theorem requires the following simple idea that compares the revenue performance of two n-dimensional vectors x and y with the same sums x[1,n] = y[1,n] = ρ, where we use the notation x[1, i] =

∑i j=1 xj. It turns out that if the partial sums of x dominate the partial

sums of y, then p′x ≥ p′y for all vectors p with decreasing components. More precisely, if x[1, i] ≥ y[1, i] for all i = 1, . . . ,n − 1, and p1 ≥ p2 ≥ . . . ≥ pn, then

∑n i=1 pixi ≥

∑n i=1 piyi.

This idea is related to the powerful concept of majorization popularized by Marshal and Olkin [14]. Here we will use it to compare the revenues associated with the sale probability vector yi = πi(T), i = 1, . . . ,n induced by a set T ⊂ N to the sales induced by a convex combination of the nested sets S0,S1, . . . ,Sn. More precisely, consider a convex combination α0,α1, . . . ,αn of non-negative values adding to one such that

∑n k=0 αkπ(Sk) = π(T) and let xi =

∑n k=0 αkπi(Sk)

be the sale of product i induced by this combination. Then x[1, i] ≥ y[1, i] i = 1, . . . ,n − 1 and x[1,n] = y[1,n] imply that the revenue from the convex combination will dominate the revenue under set T.

Theorem 5 (Taluri and van Ryzin [19])A choice model has the nested-by-fare order property if and only if π(S) is increasing in S, and for every subset T there is a convex combination of π(S0),π(S1), . . . ,π(Sn) with partial sums of sales that dominate the partial sums of sales under T with total sales equal to total sales under T .

When such a convex combination exists, it favors sales at higher fares and thus results in higher expected revenues. Talluri and van Ryzin [19] show that the conditions of Theorem 5 are satisfied by the MNL and the independent demand model. Here we show that the nested- by-fare property holds more generally by showing that it holds for the parsimonious GAM model (P-GAM) with wi = θvi for all i ∈ N for some θ ∈ [0, 1]. This model reduces to the BAM when θ = 0 and to the IDM when θ = 1. The proof of this result is in the Appendix.

Proposition 2 The efficient sets are nested-by-fares for the P-GAM.

While Theorem 5 assures us that C ⊂ {S0,S1, . . . ,Sn}, the converse does not necessarily hold as there may be some sets in the collection C that are not efficient.

Proposition 3 Consider the P-GAM for a system with strictly decreasing fares p1 > p2 > ... > pn. Then a necessary and sufficient condition for all nested sets to be efficient is pn ≥ (1 −θ)r(Sn−1).

Proof: Consider the plot (π(Sj),r(Sj)),j = 0, . . . ,n. For the P-GAM model we can write r(Sj) as a convex combination of r(Sj−1) and pj/(1−θ) of the form r(Sj) = αjr(Sj−1) + (1− αj)pj/(1 −θ), where αj = (ṽ0 + Ṽ (Sj−1))/(ṽ0 + Ṽ (Sj)). Consequently r(Sj) ≥ r(Sj−1) if and only if pj ≥ (1 − θ)r(Sj−1). Suppose now that pn ≥ (1 − θ)r(Sn−1). We will first show that the plot is increasing, or equivalently that pj ≥ (1 − θ)r(Sj−1) for all j. Suppose that there is a j such that pj < (1 − θ)r(Sj−1). Then pj/(1 − θ) < r(Sj) < r(Sj−1) and consequently pj+1 < pj < (1 −θ)r(Sj). We can then repeat the argument to show that pk < (1 −θ)r(Sk−1) for all k ≥ j, but for k = n, this contradicts pn ≥ (1 −θ)r(Sn−1).

Now, π(Sj) = αjπ(Sj−1) + (1−αj)/(1−θ). This allow us to write the marginal revenue as

uj = r(Sj) − r(Sj−1) π(Sj) −π(Sj−1)

= pj − (1 −θ)r(Sj−1) 1 − (1 −θ)π(Sj−1)

(6)

To establish concavity, it is enough to show that uj is decreasing in j. By using the known convex combinations of r(Sj) and of π(Sj) we see that

uj+1 = uj − pj −pj+1

αj(1 −πj−1) < uj

where the inequality follows from pj > pj+1.

For the BAM, Proposition 3 implies that all nested sets are efficient as long as pn ≥ r(Sn−1). At the other extreme, for the IDM all nested sets are efficient as long as pn ≥ 0. Another consequence is that for the P-GAM, the efficient sets are S0,S1, . . . ,Sj∗ where

j∗ = max{j ∈ N : pj ≥ (1 −θ)r(Sj−1)}.

All fares pj,j > j ∗ are inefficient and need not be considered.

3.2 Assortment Optimization under the MDDCM

Suppose we want to an offer set S that maximizes r(S) − zπ(S) over all S ⊂ N, where the choice model is the MDDCM characterized by (q,Q). This Markovian assumption greatly simplifies the optimization problem and allows the use of successive approximations to find the optimal revenues and the optimal offer set under the Markovian assumption. Indeed, let g0i = pi − z for i = 1, . . . ,n, where for convenience we define g00 = p0 = 0. Now consider the successive approximation scheme gk+1 = max(Qgk,p), where now p is the vector of fares with p0 = 0. Blanchet, Gallego and Goyal [2] show that g

k is monotonically increasing and converges to a vector g. The vector g represents the maximum revenue that can be obtained from each customer request under the Markov Chain discrete choice model. Blanchet, Gallego and Goyal [2] show that under the Markovian assumption an optimal offer set is is given by S = {i : gi = g0i}, and that q′g = r(S) −zπ(S). This method can be used to identify efficient sets by solving the problem for all z ≥ 0, resulting in Ej,j ∈ M. The method can be used as a heuristic to approximately compute the collection of efficient sets for discrete choice models, such as the mixture of BAMs, for which the optimization is NP-hard.

4 Dynamic Capacity Allocation Models

In this section we consider models where time is considered explicitly. Modeling time explicitly allows for time-varying arrival rates and time-varying discrete choice models. We assume that customers arrive to the system according to a time heterogenous Poisson process with intensity λt, 0 ≤ t ≤ T where T is the length of the horizon, and t represents the time-to-go. Then the number of customers that arrive during the last t units of time, say Nt, is a Poisson random variable with mean Λt =

∫ t 0 λsds.

We assume that customers arriving at time t select product k from the offered set, say S with probability πkt(S) given by the discrete choice model prevalent at time-to-go t. Let V (t,x) denote the maximum expected revenue that can be attained over the last t units of the sale horizon with x units of capacity. Assume that at time t we offer set S ⊂ N = {1, . . . ,n} and keep this set of fares open for δt units of time. If λtδt << 1, the probability that a customer arrives and requests product k ∈ S is approximately λtπkt(S)δt so

V (t,x) = max S⊂N

∑ k∈S {λtδtπkt(S)[pk + V (t− δt,x− 1)] + (1 −λtδt

∑ k∈S

πkt(S))V (t− δt,x)} + o(δt)

= V (t− δt,x) + λtδt max S⊂N

∑ k∈S

πkt(S)[pk − ∆V (t− δt,x)] + o(δt)

= V (t− δt,x) + λtδt max S⊂N

[rt(S) − ∆V (t− δt,x)πt(S)] + o(δt) (7)

for x ≥ 1 with boundary conditions V (t, 0) = V (0,x) = 0, t ≥ 0, x ∈ N where rt(S) =∑ k∈S pkπkt(S), πt(S) =

∑ k∈S πkt(S) and ∆V (t− δt,x) = V (t− δt,x) −V (t− δt,x− 1).

The value function V (t,x) is often computed approximately by solving a discrete time dynamic program based on (7) that involves rescaling time and the arrival rates, using δt = 1, and dropping the o(δt) term. Time can be rescaled by a positive real number, say a, such that T ← aT is an integer by setting λt ← 1aλt/a, πjt(S) ← πj,t/a(S). The resulting dynamic program is given by

V (t,x) = V (t− 1,x) + λt max S⊂N

[rt(S) − ∆V (t− 1,x)πt(S)], (8)

with the same boundary conditions. Formulation (8) is due to Talluri and van Ryzin [19]. It reduces to the Lee and Hersh [11] model when demands are independent. Alternatively, instead of rescaling (7) we can subtract V (t−δt,x) from both sides of equation (7), divide by δt and take limits as δt ↓ 0 to obtain the Hamilton-Jacobi-Bellman equation:

∂V (t,x)

∂t = λt max

S⊂N [rt(S) − ∆V (t,x)πt(S)] (9)

with the same boundary conditions.

The generic optimization problem in formulations (8) and (9) is of the form

max S⊂N

[rt(S) −zπt(S)]

where z ≥ 0 is the marginal value of capacity. Since there are 2n subsets S ⊂ N = {1, . . . ,n} the optimization requires the evaluation of a large number of subsets, and the problem has to be solved for different values of z as the marginal value of capacity changes with the state (t,x) of the system. One may wonder if there is a way to simplify the problem by reducing the number of subsets that we need to look at. From Theorem 3, we know that the optimization problem can be reduces to maxj∈Mt [rjt − zπjt], where Mt is the index of the collection of efficient sets at time t, rjt = rt(Ejt) and πjt = πt(Ejt) and Ejt is the jth efficient for the discrete choice model prevalent at time-to-go t. This greatly simplifies the optimization step in the dynamic programs (8) and (9). For convenience we will refer to action j when offering efficient set Ejt,j ∈ Mt. If the problem of finding the efficient sets is NP-hard, then the Markov Chain approximation suggested in [2] can be used for positive values of z ≥ 0, to heuristically identify a collection of sets that are nearly efficient and then use this collection in the calculation.

Using the notion of efficient sets, formulation (8) reduces to

V (t,x) = V (t− 1,x) + λt max j∈Mt

[rjt − ∆V (t− 1,x)πjt], (10)

while formulation (9) reduces to

∂V (t,x)

∂t = λt max

j∈Mt [rjt − ∆V (t,x)πjt]. (11)

Just as in the case of the independent demand model, the marginal value ∆V (t,x) is increasing in t and decreasing in x. For formulation (10), let

a(t,x) = arg max j∈Mt

[rjt − ∆V (t− 1,x)πjt]. (12)

For formulation (11) the definition of a(t,x) is the same except that we use ∆V (t,x) instead of ∆V (t− 1,x) in the right hand side.

Notice that rjt − ∆V (t − 1,x) > rj−1,t − ∆V (t − 1,x), so offering action j (set Ejt) is better than offering action j−1 if and only if ujt = (rjt−rj−1,t)/(πj,t−πj−1,t) > ∆V (t−1,x). Moreover, since the ujts are non-negative and decreasing in j, it follows that it is optimal to offer action

a(t,x) = max{j : ujt > ∆V (t− 1,x)},

for formulation (8). The formula for a(t,x) for formulation (9) requires ∆V (t,x) instead of ∆V (t− 1,x). The following result is valid for both formulations (10) and (11).

Theorem 6 (Taluri and van Ryzin ([19])) It is optimal to offer action j = a(t,x) (efficient set Ea(t,x),t) at state (t,x). Moreover, a(t,x) is decreasing in t and increasing in x over every time interval where the choice model is time invariant.

As t increases ∆V (t− 1,x) increases and the optimal solution shifts to efficient sets with a smaller sale probability. In contrast, as x increases, ∆V (t−1,x) decreases, and the optimal solutions shifts to efficient sets with larger sale probability. In general, we cannot say that we close lower fares when t is large (or open lower fares when x is large) because the efficient sets need not be nested-by-fare. There is a large class of choice models, however, for which the efficient sets are nested-by-fare. We have shown in Section 3 that for the parsimonious GAM (p-GAM) with wj = θvj,j ∈ N, θ ∈ [0, 1], the efficient sets are of the form Ej = Sj = {1, . . . ,j} for j = 0, . . . ,m ≤ n, so the efficient sets are nested-by-fares. Since the IDM and the BAM correspond to the cases θ = 0 and θ = 1, this implies that the collection of efficient sets are nested-by-fares for both the IDM and the BAM. If the efficient sets are nested-by- fare, then we can talk of opening and closing fares as the state dynamics change with the understanding that if a fare is open, then all higher fares will be open at the same time.

4.1 Dynamic Pricing Formulation with a Finite Price Menu

While πjt and rjt are respectively, the sales rate and the revenue rate under action j at time t, the average fare per unit sold under action j is qjt = rjt/qjt, where for convenience we define q0t = 0 for E0 = ∅. From the increasing concavity of R(ρ), it follows that qjt is decreasing in j ∈ Mt. This suggest that if t is large relative to x we should offer action 1 as this maximizes

the revenue per unit of capacity. However, offering action 1 may result in very slow sales and may lead to capacity spoilage. At the other extreme, offering action m, maximizes the revenue rate and this is optimal when capacity is abundant as this maximizes the revenue over the sales horizon. For other cases, a tradeoff is needed between the average sales price qjt and the demand rate λtπjt associated with action j that takes into account the marginal value of capacity at state (t,x). This leads to the following equivalent pricing formulation:

Then V (t,x) = V (t− 1,x) + max

j∈Mt λtπjt[qjt − ∆V (t− 1,x)]. (13)

The equivalent formulation for the continuous time model (11) is

∂V (t,x)

∂t = max

j∈Mt λtπjt[qjt − ∆V (t,x)] (14)

These formulation suggest that we are selecting among actions j ∈ Mt to maximize the sales rate λtπjt times the average fare qjt net of the marginal value of capacity. We can think of λtπjt as the demand rate associated with average fare qjt, which reduces dynamic capacity allocation models to dynamic pricing models with finite price menus.

5 Formulation as an Independent Demand Model

Recall from Chapter 1, that the discrete-time, independent demand, formulation with n fares pt1 > pt2 > .. . > ptn and demand rates λ1t, . . . ,λtn, is of the form:

V (t,x) = V (t− 1,x) + n∑ j=1

λjt[ptj − ∆V (t− 1,x)]+, (15)

with boundary conditions V (t, 0) = V (0,x) = 0. One may wonder, whether it is possible to transform formulation (10) for dependent demands into formulation (15) for demand formula- tion, by transforming data λt and (πjt,rjt), j = 0, 1, . . . ,m for the dependent demand model

into data (p̂jt, λ̂tj), j = 1, . . . ,m for the independent demand model.

One reason to wish for this is that such a transformation would allow computer codes available to solve (15) to solve (10). Another may be to try to use the transformed data as input to static models and then use static solutions or heuristics such as the EMSR-b.

While it is indeed possible to do the transformation from (10) into (15), there are really no computational benefits as the running time of both (10) and (15) are O(nct). Moreover, as we shall see later, using the transformed data as input for the static model is fraught with problems.

The data transformation is to set λ̂jt = λt[πjt −πj−1,t] and p̂jt = ujt = (rjt −rj−1,t)/(πjt − πj−1,t) for j ∈ Mt. The reader can verify that

∑j k=1 λ̂jt = λtπjt and that

∑j k=1 λ̂jtp̂jt = λtrjt.

This transformation is in effect creating artificial products with independent demands λ̂jt and

prices p̂jt for j ∈ Mt.

Theorem 7 Formulation (10) is equivalent to

V (t,x) = V (t− 1,x) + ∑ j∈Mt

λ̂jt[p̂jt − ∆V (t− 1,x)]+ (16)

with boundary conditions V (t, 0) = V (0,x) = 0.

Proof of Theorem 7: Let k(t,x) = max{j : p̂jt > ∆V (t − 1,x)}. It is therefore optimal to accept all of the artificial products such that j ≤ k(t,x). Notice that a(t,x) = k(t,x) on account of p̂jt = ujt. Aggregating artificial products 1, . . . ,a(t,x), results in action k(t,x) = a(t,x), corresponding to efficient set Ea(t,x),t. More precisely,

∑ j∈Mt

λ̂jt[p̂jt − ∆V (t− 1,x)]+ = ∑

j≤k(t,x) λ̂jt[p̂jt − ∆V (t− 1,x)]

= ∑

j≤a(t,x) λ̂jt[p̂jt − ∆V (t− 1,x)]

= ra(t,x),t −πa(t,x),t)∆V (t− 1,x) = max

j∈Mt [rjt −πjt∆V (t− 1,x)].

Consequently,

V (t,x) = V (t− 1,x) + ∑ j∈Mt

λ̂jt[p̂jt − ∆V (t− 1,x)]+ = V (t− 1,x) + λt max j∈Mt

[rjt −πjt∆V (t,x)].

The fare and demand transformations that map λt and (πjt,rjt),j ∈ Mt into (p̂jt, λ̂jt),j ∈ M have appeared in many papers but the original ideas go back to Kincaid and Darlin [10] as documented in Walczack et al. [20]. The fact that the transformation works for the dynamic program has lead some practitioners to conclude that the transformed data can be used as an input to static models where demands are treated as independent. In particular, Fiig et al. [6], and Walczack et al. [20], proposed feeding the transformed data and applying the EMSR-b heuristic, treating demands as independent under the low-to-high arrival pattern assumption. While this is a tempting heuristic, it would be wrong to expect that controls obtained this way will be close to optimal if demand dependencies are ignored. In particular, the low-to-high demand arrival pattern, is tantamount to assuming that Poisson demands with parameters λ̂jt will arrive low-to-high, but this are artificial demands from customers willing to buy under action j but not under action j−1. When capacity is allocated to this marginal customers, we cannot prevent some degree of demand cannibalization from customers willing to buy under action j − 1 into some of the fares in action j. We will return to this issue in Section 8.

6 Upper Bound on the Value Function

We will present an an upper bound on the value functions (7) an (8) for the case where the choice models are time invariant. The idea is based on approximate dynamic programming (ADP) using affine functions and it can be easily extended to the case where the choice models vary over time. It is well known that a dynamic program can be solved as a mathematical pro- gram by making the value function at each state a decision variable. This leads to the formula- tion V (T,c) = min F(T,c) subject to the constraints ∂F (t,x)/∂t ≥ λt[rj−∆F(t,x)πj] ∀(t,x) for all j ∈ M, where the decision variables are the class of non-negative functions F(t,x) that are differential in x with boundary conditions F(t, 0) = F(0,x) = 0 for all t ∈ [0,T] and all x ∈{0, 1, . . . ,c}.

While this formulation is daunting, it becomes easier once we restrict the functions to be of the affine form F̃(t,x) =

∫ t 0 βs(x)ds + xzt, zt ≥ 0. We will restrict ourselves to the

case βs(x) = βs for x > 0, βs(0) = 0, zt = z for t > 0 and z0 = 0. With this restriction, the partial derivative and marginal value of capacity have simple forms and the boundary conditions are satisfied. More precisely, ∂F̃(t,x)/∂t = βt for x > 0, ∆F̃(t,x) = z, for t > 0, F̃(t, 0) = F̃(0, t) = 0. This reduces the program to Ṽ (T,c) = min F̃(T,c) = min

∫T 0 βtdt + cz,

subject to βt ≥ λt[rj − zπj] ∀j ∈ M. Since we have restricted the set of functions F(t,x) to be affine, it follows that V (T,c) ≤ Ṽ (T,c). Since this is a minimization problem, the optimal choice for βt is βt = λt maxj∈M [rj − zπij] = λtg(z) where g(z) = maxj∈M [rj − zπj] is a decreasing convex, non-negative, piecewise linear function of z. Consequently the overall problem reduces to

Ṽ (T,c) = min z≥0

[ ∫ T

0 λtg(z)dt + cz] = min

z≥0 [ΛTg(z) + cz], (17)

where ΛT = ∫T

0 λtdt is the aggregate arrival rate over the sales horizon [0,T]. Notice that ΛTg(z) + cz is convex in z. If the discrete choice model is time varying, then gt(z) = maxj∈Mt [rjt −zπjt], result in

V̄ (T,c) = min z≥0

[∫ T 0 λtgt(z)dt + cz

] ,

where the objective function is also convex in z.

We can linearize (17) by introducing a new variable, say y, such that y ≥ rj − zπj for all j ∈ M and z ≥ 0, which results in the linear program:

Ṽ (T,c) = min z≥0

[ΛTy + cz],

subject to ΛTy + ΛTπjzj ≥ ΛTrj j ∈ M z ≥ 0,

where for convenience we have multiplied the constraints y + πjz ≥ rj,j ∈ M by ΛT > 0.

The dual of this problem is given by

Ṽ (T,c) = ΛT max ∑ j∈M

rjtj

subject to ΛT ∑ j∈M

πjtj ≤ c ∑ j∈M

tj = 1

tj ≥ 0 ∀j ∈ M.

This linear program decides the proportion of time , tj ∈ [0, 1], that each efficient set Ej is offered to maximize the revenue subject to the capacity constraint. The reader can verify that Ṽ (T,c) is closely related to the functions RC(ρ) and R(ρ) used to define efficient sets. The following result establishes this relationship.

Proposition 4 Ṽ (T,c) = ΛTRC(c/ΛT ) = ΛTR(c/ΛT ).

Example 5 Suppose that p1 = 1000,p2 = 600 and a BAM with v0 = v1 = v2 = e 1. We will

assume that the aggregate arrival rate over the sales horizon [0,T] = [0, 1] is ΛT = 40. We know that for this problem the efficient sets are E0 = ∅,E1 = {1} and E2 = {1, 2}. For c = 24, the optimal solution is t1 = 6/15 and t2 = 9/15 with 8 units of sales under action S1 and 16 units of sales under action S2 and revenue $20,800. Table 6 provides the upper bound Ṽ (T,c) for different values of c. Notice that sales under action S1, first increase and then decrease as c increases. Table 6 also reports sales under fare 1 and fare 2, and the reader can see that sales under fare 1 first increase and then decrease.

c Ṽ (T,c) t1 t2 sales S1 sales S2 fare 1 sales fare 2 sales 12 12,000 0.6 0.0 12 0 12 0 16 16,000 0.8 0.0 16 0 16 0 20 20,000 1.0 0.0 20 0 20 0 22 20,400 0.7 0.3 14 8 18 4 24 20,800 0.4 0.6 8 16 16 8 26 21,200 0.1 0.9 2 24 14 12 28 21,333 0.0 1.0 0 26.6 13.3 13.3 32 21,333 0.0 1.0 0 26.6 13.3 13.3

Table 6: Upper Bound and Optimal Actions for Fluid Model

7 Uni-directional Dynamic Programming Models

Like the Lee and Hersh model, formulations (8) and (9), implicitly assume that the capacity provider can offer any subset of fares at any state (t,x). This flexibility works well if customers are not strategic. Otherwise, customers may anticipate the possibility of lower fares and

postpone their purchases in the hope of being offered lower fares at a later time. If customers act strategically, the capacity provider may counter by imposing restrictions that do not allow lower fares to reopen once they are closed. Actions to limit strategic customer behavior are commonly employed by revenue management practitioners.

Let VS(t,x) be the optimal expected revenue when the state is (t,x), we are allowed to use any subset U ⊂ S of fares and fares cannot be offered once they are closed. The system starts at state (T,c) and S = N. If a strict subset U of S is used then all fares in U ′ = {j ∈ N : j /∈ U} are permanently closed and cannot be offered at a later state regardless of the evolution of sales. To obtain a discrete time counterpart to (8), let

WU (t,x) = VU (t− 1,x) + λt[rt(U) −πt(U)∆VU (t− 1,x)].

Then the dynamic program is given by

VS(t,x) = max U⊂S

WU (t,x) (18)

with boundary conditions VS(t, 0) = VS(0,x) = 0 for all t ≥ 0, x ∈ N and S ⊂ N. The goal of this formulation is to find VN (T,c) and the corresponding optimal control policy.

Notice that formally the state of the system has been expanded to (S,t,x) where S is the last offered set and (t,x) are, as usual, the time-to-go and the remaining inventory. For- mulation (8) requires optimizing over all subsets S ⊂ N, while formulation (18) requires an optimization over all subsets U ⊂ S for any given S ⊂ N. Obviously the complexity of these formulations is large if the number of fares is more than a handful. Airlines typically have over twenty different fares so the number of possible subsets gets large very quickly. Fortunately, in many cases we do not need to do the optimization over all possible subsets. As we have seen, the optimization can be reduced to the set of efficient fares Ct at time t. For the p-GAM, we know that Ct ⊂ {S0,S1, . . . ,Sn} where Sj = {1, . . . ,j} are the nested-by-fares sets. For the p-GAM, and any other model for which the nested-by-fare property holds, the state of the system reduces to (j,t,x) where Sj is the last offered set at (t,x). For such models the formulation (18) reduces to

Vj(t,x) = max k≤j

Wk(t,x) (19)

where Vj(t,x) = VSj (t,x) and

Wk(t,x) = Vk(t− 1,x) + λt[rkt −πkt∆Vk(t− 1,x)],

rkt = ∑ l∈Sk plπlt(Sk) and πkt =

∑ l∈Sk πlt(Sk).

The structural results obtained for the Independent Demand Model carry on to the p- GAM model and for other models where the efficient sets have the nested-by-fare property. Let

aj(t,x) = max{k ≤ j : Wk(t,x) = Vj(t,x)}. (20)

Theorem 8 At state (j,t,x) it is optimal to offer set Saj (t,x) = {1, . . . ,aj(t,x)}, where aj(t,x), given by (20), is decreasing in t and increasing in x over time intervals where the choice model is time invariant.

The proof of this results follows the sample path arguments of the corresponding proof in the independent demand chapter. Clearly V1(t,x) ≤ V2(t,x) ≤ Vn(t,x) ≤ V (t,x) ≤ Ṽ (t,x) as V (t,x) is not restricted, as Vj(t,x) is, to monotone offerings.

c Load Factor V3(T,c) V (T,c) Ṽ (T,c) 4 4.59 3,769 3,871 4,000 6 3.06 5,356 5,534 6,000 8 2.30 6,897 7,013 7,477

10 1.84 8,259 8,335 8,950 12 1.53 9,304 9,382 10,423 14 1.31 9,976 10,111 10,846 16 1.15 10,418 10,583 11,146 18 1.02 10,803 10,908 11,447 20 0.92 11,099 11,154 11,504 22 0.84 11,296 11,322 11,504 24 0.77 11,409 11,420 11,504 26 0.71 11,466 11,470 11,504 28 0.66 11,490 11,492 11,504

Table 7: Value Functions for Dynamic Allocation Policies

Example 6 Table 7 presents the value functions V3(T,c), V (T,c) and the upper bound Ṽ (T,c) that result from solving the dynamic programs (10) and (19) for the MNL model with fares p1 = $1, 000,p2 = $800,p3 = $500 with price sensitivity βp = −0.0035, schedule quality si = 200 for i = 1, 2, 3 with quality sensitivity βs = 0.005, and an outside alternative with p0 = $1, 100 and schedule quality s0 = 500, Gumbel parameter φ = 1, arrival rate λ = 25 and T = 1. Recall that for the MNL model, the attractiveness vi = e

φµi where µi is the mean utility. In this case µi = βppi + βssi. The computations were done with time rescaled by a factor a = 25, 000. Not surprisingly V3(T,c) ≤ V (T,c) as V3(T,c) constraints fares to remain closed once they are closed for the first time. However, the difference in revenues is relatively small except for load factors around 1.15 where the difference can be as large as 1.5% in revenues.

8 Static Capacity Allocation Models

We will assume that we are working with a choice model with efficient sets that are nested: E0 ⊂ E1 . . . ⊂ Em, even if they are not nested by fare. We continue using the notation πj =∑ k∈Ej πk(Ej) and rj =

∑ k∈Ej pkπk(Ej), so the slopes uj = (rj−rj−1)/(πj−πj−1),j = 1, . . . ,m

are positive and decreasing. We will denote by qj = rj/πj the average fare, conditioned on a sale, under efficient set Ej (action j) for j > 1 and define q0 = 0.

Suppose that the total number of potential customers over the selling horizon is a random variable D. For example, D can be Poisson with parameter Λ. Let Dj be the total demand if only set Ej is offered. Then Dj is conditionally binomial with parameters D and πj, so if D is Poisson with parameter Λ, then Dj is Poisson with parameter Λπj.

We will present an exact solution for the two fare class capacity allocation problem and a heuristic for the multi-fare case. The solution to the two fare class problem is, in effect, an extension of Littlewood’s rule for discrete choice models. The heuristic for the multi-fare problem applies the two fare result to each pair of consecutive actions, say j and j + 1 just as in the case of independent demands.

8.1 Two Fare Classes

While capacity is available, the capacity provider will offer either action 2 associated with efficient set E2 = {1, 2}, or action 1, associated with efficient set E1 = {1}. If the provider runs out of inventory he offers action 0, corresponding to E0 = {0}, which is equivalent to stop selling. We will assume action 2 will be offered first and that y ∈ {0, . . . ,c} units of capacity will be protected for sale under action 1.

Under action 2, sales are min(D2,c−y) and expected revenues are q2E min(D2,c−y) as q2 is the average fare per unit sold under action 2. Of the (D2 − c + y)+ customers denied bookings, a fraction β = π1/π2 will be willing to purchase under action 1. Thus, the demand under action 1 will be a conditionally binomial random variable, say U(y), with a random number (D2 − c + y)+ of trials and success probability β. The expected revenue that results from allowing up to c−y bookings under action 2 is given by

W2(y,c) = q1E min(U(y), max(y,c−D2)) + q2E min(D2,c−y),

where the first term corresponds to the revenue under action 1. Notice that if D2 ≤ c − y then U(y) = 0 since U(y) is a binomial random variable with zero trials. Conditioning the first term on the event D2 > c−y, allow us to write

W2(y,c) = q1E[min(U(y),y)|D2 ≥ c−y)]P(D2 > c−y) + q2E min(D2,c−y).

The reader may be tempted to follow the marginal analysis idea presented in Chapter 1 for the independent demand case. In the independent demand case, the marginal value of protecting one more unit of capacity is realized only if the marginal unit is sold. The counterpart here would be P(U(y) ≥ y|D2 > c−y) and a naive application of marginal analysis would protect the yth unit whenever q1P(U(y) ≥ y|D2 > c−y) > q2.

However, with dependent demands, protecting one more unit of capacity also increases the potential demand under action 1 by one unit. This is because an additional customer is denied capacity under action 2 (when D2 > c − y) and this customer may end up buying a unit of capacity under action 1 even when not all the y units are sold. Ignoring this can lead to very different results in terms of protection levels. The correct analysis is to acknowledge that an extra unit of capacity is sold to the marginal customer with probability βP (U(y − 1) < y − 1|D2 > c−y). This suggest protecting the yth unit whenever

q1[P(U(y) ≥ y|D2 > c−y) + βP (U(y − 1) < y − 1|D2 > c−y)] > q2.

To simplify the left hand side, notice that conditioning on the decision of the marginal customer

results in

P(U(y) ≥ y|D2 > c−y) = βP (U(y−1) ≥ y−1|D2 > c−y)+(1−β)P(U(y−1) ≥ y|D2 > c−y).

Combining terms leads to protecting the yth unit whenever

q1[β + (1 −β)P(U(y − 1) ≥ y|D2 > c−y)] > q2.

Let

r = u2/q1 = q2 −βq1 (1 −β)q1

, (21)

denote the critical fare ratio. In industry, the ratio r given by (21) is know as fare adjusted ratio, in contrast to the unadjusted ratio q2/q1 that results when β = 0.

The arguments above suggests that the optimal protection level can be obtained by se- lecting the largest y ∈ {1, . . . ,c} such that P(U(y − 1) ≥ y|D2 > c − y) > r provided that P(U(0) ≥ 1|D2 ≥ c) > r and to set y = 0 otherwise.

To summarize, an optimal protection can be obtain by setting y(c) = 0 if P(U(0) ≥ 1|D2 > c) ≤ r and setting

y(c) = max{y ∈{1, . . . ,c} : P(U(y − 1) ≥ y|D2 > c−y) > r} otherwise. (22)

Formula (22) is due to Gallego, Lin and Ratliff [7]. This is essentially a reinterpretation of the main result in Brumelle et al. originally developed for fares instead of actions.

One important observation is that for dependent demands the optimal protection level y(c) depends in c in a more complicated way than in the independent demand model. For the dependent demand model y(c) is first increasing and then decreasing in c. The reason is that for low capacity it is optimal to protect all the inventory for sale under action 1. However, for high capacity it is optimal to allocate all the capacity to action 2. The intuition is that action 2 has a higher revenue rate, so with high capacity we give up trying to sell under action 1. This is clearly seen in Table 6 of Example 5. Heuristic solutions that propose protection levels of the form min(yh,c), which are based on independent demand logic, are bound to do poorly when c is close to Λπ2. In particular, the heuristics developed by Belobaba and Weatherford [1], and more recently by Fiig et al. [6] and by Walczak et al. [20] are of this form and are therefore not recommended.

One can derive Littlewood’s rule for discrete choice models (22) formally by analyzing ∆W2(y,c) = W2(y,c)−W2(y−1,c), the marginal value of protecting the yth unit of capacity for sale under action 1.

Proposition 5

∆W2(y,c) = [q1(β + (1 −β)P(U(y − 1) ≥ y|D2 > c−y) − q2]P(D2 > c−y). (23)

Moreover, the expression in brackets is decreasing in y ∈{1, . . . ,c}.

The proof of Proposition 5 is due to Gallego, Lin and Ratliff [7] and can be found in the Appendix. As a consequence, ∆W2(y,c) has at most one sign change. If it does, then it must be from positive to negative. W2(y,c) is then maximized by the largest integer y ∈{1, . . . ,c}, say y(c), such that ∆W2(y,c) is positive, and by y(c) = 0 if ∆W2(1,c) < 0. This confirms Littlewood’s rule for discrete choice models (22).

8.2 Heuristic Protection Levels

In terms of computations, notice that

P(U(y − 1) ≥ y|D2 > c−y) = lim L→∞

∑L k=0 P(Bin(y + k,β) ≥ y)P(D2 = c + k + 1)

P(D2 > c−y) ,

where we have been careful to only include terms that can be potentially positive in the numerator. It is often enough to truncate the sum at a value, say L, such that P(D2 > L) ≤ �P(D2 > c − y) for some small �, as those terms do not materially contribute to the sum. While the computation of y(c) and V2(c) = W2(y(c),c) is not numerically difficult, the conditional probabilities involved may be difficult to understand conceptually. Moreover, the formulas do not provide intuition and do not generalize easily to multiple fares. In this section we develop a simple heuristic to find near-optimal protection levels that provides some of the intuition that is lacking in the computation of optimal protection levels y(c). In addition, the heuristic can easily be extended to multiple-fares.

The heuristic consists of approximating the conditional binomial random variable U(y−1) with parameters (D2−c+y−1)+ and β by its conditional expectation, namely by (Bin(D2,β)− β(c+ 1−y))+. Since Bin(D2,β) is just D1, the approximation yields (D1−β(c+ 1−y))+. We expect this approximation to be reasonable if ED1 ≥ β(c + 1−y). In this case P(U(y−1) ≥ y|D2 > c−y) can be approximated by P(D1 ≥ (1 −β)y + β(c + 1)). Let

yp = max{y ∈N : P(D1 ≥ y) > r)}.

We think of yp = (1 − β)y + β(c + 1) as a pseudo-protection level that will be modified to obtain a heuristic protection level yh(c) when the approximation is reasonable. Let y = (yp −β(c + 1))/(1 −β). Then ED1 ≥ β(c + 1 −y)) is equivalent to the condition

c < yp + d2

where d2 = E[D2] − E[D1] = Λ(π2 − π1) = Λπ2(1 − β), where for convenience we write β = β1 = π1/π2. Consequently, if c < y

p + d2 we set

yh(c) = max

{ y ∈N : y ≤

yp −β(c + 1) (1 −β)

} ∧ c.

When the condition c < yp +d2 fails, we set y h(c) = 0. Thus, the heuristic will stop protecting

capacity for action 1 when c ≥ yp + d2. In a deterministic setting yp = E[D1], so it stops protecting when c ≥ E[D1] + d2 = E[D2], i.e., when there is enough capacity to satisfy the

expected demand under action 2.

Notice that the heuristic involves three modifications to Littlewood’s rule for independent demands. First, instead of using the first choice demand D1,2 for fare 1, when both fares are open, we use the stochastically larger demand D1 for fare 1, when it is the only open fare. Second, instead of using the ratio of the fares p2/p1 we use the modified fare ratio r = u2/q1 based on sell-up adjusted fare values. From this we obtain a pseudo-protection level yp that is then modified to obtain yh(c). Finally, we keep yh(c) if c < yp +d2 and set y

h(c) = 0 otherwise. In summary, the heuristic involves a different distribution, a fare adjustment, and a modification to the pseudo-protection level. The following example illustrates the performance of the heuristic.

Example 7 Suppose that p1 = 1000,p2 = 600 and a BAM with v0 = v1 = v2 = e 1 and that

Λ = 40 as in Example 5. We report the optimal protection level y(c), the heuristic protection level yh(c), the upper bound V̄ (c), the optimal expected revenue V (c) of the uni-directional formulation (18), the performance V2(c) of y(c) and the performance V

h 2 (c) = W2(y

h(c),c) of yh(c), and the percentage gap between (V2(c) − V h2 (c))/V2(c) in Table 8. Notice that the performance of the statiheuristic, V h2 (c), is almost as good as the performance V2(c) of the optimal policy under the uni-directional formulation (18) where fares classes cannot be opened once they are closed.

c y(c) yh(c) V̄ (c) V (c) V2(c) V h

2 (c) Gap(%) 12 12 12 12,000 11,961 11,960 11,960 0.00 16 16 16 16,000 15,610 15,593 15,593 0.00 20 20 20 20,000 18,324 18,223 18,223 0.00 24 21 24 20,800 19,848 19,526 19,512 0.07 28 9 12 21,333 20,668 20,414 20,391 0.11 32 4 0 21,333 21,116 21,036 20,982 0.26 36 3 0 21,333 21,283 21,267 21,258 0.05 40 2 0 21,333 21,325 21,333 21,322 0.01

Table 8: Performance of Heuristic for Two Fare Example

8.3 Theft versus Standard Nesting and Arrival Patterns

The types of inventory controls used in the airline’s reservation system along with the demand order of arrival are additional factors that must be considered in revenue management opti- mization. If y(c) < c, we allow up to c−y(c) bookings under action 2 with all sales counting against the booking limit c−y(c). In essence, the booking limit is imposed on action 2 (rather than on fare 2). This is known as theft nesting. Implementing theft nesting controls may be tricky if a capacity provider needs to exert controls through the use of standard nesting, i.e., when booking limits are only imposed on the lowest open fare. This modification may be required either because the system is built on the philosophy of standard nesting or because users are accustomed to thinking of imposing booking limits on the lowest open fare. Here we explore how one can adapt protection levels and booking limits for the dependent demand model to situations where controls must be exerted through standard nesting.

A fraction of sales under action 2 correspond to sales under fare p2. This fraction is given by ω = π2(E2)/π2. So if booking controls need to be exerted directly on the sales at fare p2, we can set booking limit ω(c−y(c)) on sales at fare p2. This is equivalent to using the larger protection level

ŷ(c) = (1 −ω)c + ωy(c) (24)

for sales at fare 1. This modification makes implementation easier for systems designed for standard nesting controls and it performs very well under a variety of demand arrival patterns.

It is possible to combine demand choice models with fare arrival patterns by sorting cus- tomers through their first choice demand and then assuming a low-to-high demand arrival pattern. For the two fare case, the first choice demands for fare 1 and fare 2 are Poisson random variables with rates Λπ1(E2) and Λπ2(E2). Assume now that customers whose first choice demand is for fare 2 arrive first, perhaps because of purchasing restrictions associated with this fare. Customers whose first choice is fare 2 will purchase this fare if available. They will consider upgrading to fare 1 if fare 2 is not available. One may wonder what kind of control is effective to deal with this arrival pattern. It turns out that setting protection level ŷ(c) given by (24) for fare 1, with standard nesting, is optimal for this arrival pattern and is very robust to other (mixed) arrival patterns (Gallego, Li and Ratliff [8]).

8.4 Multiple Fare Classes

For multiple fare classes, finding optimal protection levels can be very complex. However, if we limit our search to the best two consecutive efficient sets we can easily adapt the results from the two-fare class to deal with the multiple-fare class problem. For any j ∈{1, . . . ,n−1} consider the problem of allocating capacity between actions j (corresponding to efficient set Ej) and action j + 1 (corresponding to efficient set Ej+1) where action j + 1 is offered first. In particular, suppose we want to protect y ≤ c units of capacity for action j against action j + 1. We will then sell min(Dj+1,c−y) units under action j + 1 at an average fare qj+1. We will then move to action j with max(y,c−Dj+1) units of capacity and residual demand Uj(y), where Uj(y) is conditionally binomial with parameters (Dj+1 − c + y)+ and βj = πj/πj+1. Assuming we do not restrict sales under action j the expected revenue under actions j + 1 and j will be given by

Wj+1(y,c) = qjE min(Uj(y), max(y,c−Dj+1)) + qj+1E min(Dj+1,c−y). (25)

Notice that under action j we will either run out of capacity or will run out of customers. Indeed, if Uj(y) ≥ y then we run out of capacity, and if Uj(y) < y then we run out of customers. Let Wj+1(c) = maxy≤c Wj+1(y,c) and set W1(c) = q1E min(D1,c). Clearly,

Vn(c) ≥ max 1≤j≤n

Wj(c), (26)

so a simple heuristic is to compute Wj(c) for each j ∈ {1, . . . ,n} and select j to maxi- mize Wj(c). To find an optimal protection level for Ej against Ej+1 we need to compute ∆Wj+1(y,c) = Wj+1(y,c)−Wj+1(y−1,c). For this we can repeat the analysis of the two fare case to show that an optimal protection level for action Ej against action Ej+1 is given by

yj(c) = 0 if ∆Wj+1(1,c) < 0 and by

yj(c) = max{y ∈{1, . . . ,c} : P(Uj(y − 1) ≥ y|Dj+1 > c−y) > rj}, (27)

where

rj = uj+1 qj

= qj+1 −βjqj (1 −βj)qj

Alternatively, we can use the heuristic described in the two fare section to approximate Uj(y−1) by Dj−βj(c+ 1−y) and use this in turn to approximate the conditional probability in (27) by P(Dj ≥ y + β(c−y + 1)). This involves finding the pseudo-protection level

y p j = max{y ∈N : P(D1 ≥ y) > rj}.

If c < y p j + dj+1, then

yhj (c) = max

{ y ∈N+ : y ≤

y p j −βj(c + 1)

1 −βj

} ∧ c, (28)

and set yh(c) = 0 if c ≥ ypj + dj+1.

We will let V hn (c) be the expected revenues resulting from applying the protection levels.

Example 8 Consider now a three fare example with fares p1 = 1000,p2 = 800,p3 = 500, schedule quality si = 200, i = 1, 2, 3, βp = −.0035, βs = .005, φ = 1. Then v1 = .082,v2 = 0.165,v3 = 0.472. Assume that the outside alternative is a product with price p0 = 1100 and schedule quality s0 = 500 and that the expected number of potential customers is Poisson with parameter Λ = 25. Table 9 reports the protection levels yj(c) and y

h j (c) as well as V3(c) and

V h3 (c) for c ∈{4, 6, . . . , 26, 28}. As shown in the Table, the heuristic performs very well with a maximum gap of 0.14% relative to V3(c) which was computed through exhaustive search. It is also instructive to see that V h3 (c) is not far from V3(c,T), as reported in Table 7, for the dynamic model. In fact, the average gap is less than 0.5% while the largest gap is 1.0% for c = 18.

Example 7 suggest that the heuristic for the static model works almost as well as the optimal dynamic program Vn(T,c) for the case where efficient sets are nested-by-fare and fares cannot be opened once they are closed for the first time. Thus the multi-fare heuristic described in this section works well to prevent strategic customers from gaming the system provided that the efficient fares are nested-by-fare as they are in a number of important applications. While the heuristic for the static model gives up a bit in terms of performance relative to the dynamic model, it has several advantages. First, the static model does not need the overall demand to be Poisson. Second, the static model does not need as much detail in terms of the arrival rates. These advantages are part of the reason why people in industry have a preference for static models, even thought dynamic models are easier to understand, easier to solve to optimality, and just as easy to implement.

c Load y1(c) y2(c) y h 1 (c) y

h 2 (c) V3(c) V

h 3 (c) Gap(%)

4 4.59 4 4 4 4 3,769 3,769 0.00 6 3.06 3 6 3 6 5,310 5,310 0.00 8 2.30 1 8 1 8 6,845 6,845 0.00

10 1.84 0 10 0 10 8,217 8,217 0.00 12 1.53 0 12 0 12 9,288 9,288 0.00 14 1.31 0 14 0 14 9,971 9,971 0.00 16 1.15 0 13 0 14 10,357 10,354 0.02 18 1.02 0 9 0 10 10,700 10,694 0.05 20 0.92 0 5 0 6 11,019 11,019 0.00 22 0.84 0 4 0 2 11,254 11,238 0.14 24 0.77 0 3 0 0 11,391 11,388 0.03 26 0.71 0 2 0 0 11,458 11,450 0.08 28 0.66 0 2 0 0 11,488 11,485 0.03

Table 9: Performance of Heuristic for Three Fare Example

9 Acknowledgments

I acknowledge the feedback from my students and collaborators. In particular, I would like to recognize the contributions and feedback from Anrand Li, Lin Li, Jing Dong and Richard Ratliff.

10 Appendix

Proof of Theorem 1: If the choice model πj(S),j ∈ S satisfies GAM then there exist constants vi, i ∈ N+, ṽi ∈ [0,vi] i ∈ N and ṽ0 = v0 +

∑ j∈N (vj − ṽj) such that πR(S) = v(R)/(ṽ0 + Ṽ (S)

for all R ⊂ S. For this choice model, the left hand side of Axiom 1’ is (ṽ0 +Ṽ (S)/(ṽ0 +Ṽ (T)). Let θj = 1−ṽj/vj. Then the right hand side of Axiom 1’ is given by 1−

∑ j∈T−S(1−θj)vj/(ṽ0 +

Ṽ (T)) = (ṽ0 + Ṽ (S)/(ṽ0 + Ṽ (T)). This shows that the GAM satisfies Axiom 1’. If πi({i}) = 0 then vi = 0 and consequently ṽi = 0. From this it follows that πS−{i}(T −{i}) =

V (S)−vi ṽ0+Ṽ (T)−ṽi

= V (S)

ṽ0+Ṽ (T) = πS(T), so Axiom 2 holds.

Conversely, suppose a choice model satisfies the GLA. Then by selecting R = {i} ⊂ S ⊂ T = N we see from Axiom 1’ that

πi(S) = πi(N)

1 − ∑ j/∈S(1 −θj)πj(N)

Since π0(N) + ∑ j∈N πj(N) = 1 the denominator can be written as π0(N) +

∑ j∈N θjπj(N) +∑

j∈S(1 −θj)πj(N), resulting in

πi(S) = πi(N)

π0(N) + ∑ j∈N θjπj(N) +

∑ j∈S(1 −θj)πj(N)

Letting vj = πj(N) for all j ∈ N+, ṽ0 = v0 + ∑ j∈N θjvj and ṽj = (1 −θj)vj ∈ [0, 1] for j ∈ N

the choice model can be written as

πi(S) = vi

ṽ0 + ṽ(S) ,

so it satisfies the GAM.

Proof of Theorem 2: Let N ′ be the universe of products. The set of vendors is M = {1, . . . ,m}. Let vij be the attraction value of product i offered by vendor j, and let Sj ⊂ N ′ be the products offered by vendor j ∈ M. Let Oi = {j ∈ M : i ∈ Sj} be the set of vendors offering product i, and N = {i ∈ N ′ : Oi 6= ∅} be the collection of products that are offered. We consider a NL model where customers first select a product and then a vendor. Let γi be the dissimilarity parameter of nest i. Under the nested model, a nest i with Oi 6= ∅, is selected with probability (∑

l∈Oi v 1/γi il

)γi v0 +

∑ k

(∑ l∈Ok v

1/γk kl

)γk . (29) The conditional probability that a customer who selects nest i will buy the product from

vendor j ∈ Oi is given by v

1/γi ij∑

l∈Oi v 1/γi il

. (30)

Consequently, the probability that a customer selects product i ∈ Sj from vendor j is given

by (∑ l∈Oi v

1/γi il

)γi v0 +

∑ k

(∑ l∈Ok v

1/γk kl

)γk v 1/γi ij∑

l∈Oi v 1/γi il

We are interested in the limit of these probabilities as γk ↓ 0 for all k. For this purpose we remark that well know fact: limγk↓0(

∑ l∈Ok v

1/γk il )

γk = maxl∈Oi vil. The limit, say pij, of equation (30) is equal to 1 if vendor j is the most attractive, i.e., if vij > vil for all l ∈ Oi, l 6= j; pij = 0, if vij < maxl∈Oi vil; and pij = 1/m if vij = maxl∈Oi vil but vendor j is tied with other m− 1 vendors.

Now,

πi(Sj) = pij maxl∈Oi vil

v0 + ∑ k maxl∈Ok vkl

For all k ∈ N, define V −kj = maxl∈Ok,l 6=j vkl and Vkj = maxl∈Ok∪{j} vkl. Then, for i ∈ Sj,

πi(Sj) = Vijpij

v0 + ∑ k∈Sj Vkj +

∑ k∈S̄j V

− kj

where S̄j = {i ∈ N : i /∈ Sj}. The selection probability can be written as

πi(Sj) = Vijpij

v0 + ∑ k∈N V

− kj +

∑ k∈Sj (Vkj −V

− kj )

or equivalently, as

πi(Sj) = Vijpij

v0 + ∑ k∈N V

− kj +

∑ k∈Sj (Vkj −V

− kj )pkj

since Vkj − V −kj = (Vkj − V − kj )pkj for all k. This is because pkj < 1 implies Vkj − V

− kj = 0.

This shows that πi(Sj) corresponds to a GAM with ã0j = v0 + ∑ k∈N V

− kj , akj = pkjVkj, and

ãkj = (Vkj −V −kj )pkj.

Proof of Proposition 2: Notice that π(S) is increasing in S for the GAM since V (S) ≥ Ṽ (S) ≥ 0 are both increasing in S. Since the P-GAM is a special case π(S) is increasing in S. Now, consider a set T that is not nested-by-fare and let ρ = π(T). Then there exist an index k such that πk < ρ ≤ πk+1. Let tk = (πk+1 −ρ)/(πk+1 −πk), tk+1 = 1 − tk and tj = 0 for all other j. Then

∑n j=1 πjtj = tkπk + tk+1πk+1 = ρ. We will show that the vector x, defined by

xj = tkπj(Sk) + tk+1πj(Sk+1),j = 1, . . . ,n majorizes the vector πj(T), j = 1, . . .n. To do this we will need to establish the following equation:

tkπi(Sk) + tk+1πi(Sk+1) = vi

ṽ0 + Ṽ (T) ∀ i ≤ k. (31)

The result then follows, because for j ≤ k, equation (31) implies that

x[1,j] = v(Sj)

ṽ0 + Ṽ (T) ≥ πSj (T),

as not all elements in Sj need to be in T. On the other hand, for j > k, we have

x[1,j] = π(T) ≥ πSj (T),

since again, not all elements in Sj need to be in T.

To verify equation (31) we will show that tk and tk+1 can be written as

tk = ṽ0 + ṽ(Sk)

ṽ0 + ṽ(T)

v(Sk + 1) −v(T) v(Sk + 1) −v(Sk)

and

tk+1 = ṽ0 + ṽ(Sk+1)

ṽ0 + ṽ(T)

v(T) −v(Sk) v(Sk + 1) −v(Sk)

as then (31) follows from simple algebra. We know from arguments in the proof of Proposi- tion 2, that

πk+1 −πk = πk+1(Sk+1) ṽ0

ṽ0 + ṽ(Sk) ,

so showing the result for tk is equivalent to showing that

πk+1 −ρ = v(Sk + 1) −v(T) ṽ0 + ṽ(Sk + 1)

ṽ0 ṽ0 + ṽ(T)

which follows from easily from the fact that ṽj = (1−θ)vj for all j. The proof for tk+1 follows along similar lines.

Proof of Proposition 5: We need to compute the difference between E[min(U(y), max(y,c− D2)) and E[min(U(y − 1), (y − 1,c− D2)]. Since both quantities are zero when c − D2 ≥ y and if c − D2 < y the difference reduces to E[min(U(y),y)] − E[min(U(y − 1),y − 1)]. By expressing the expectation as the sum of tail probabilities and conditioning on the demand from the potential customer we see that

E[min(U(y),y)] = y∑ z=1

P(U(y) ≥ z)

= P(U(y) ≥ y) + β y−1∑ z=1

P(U(y − 1) ≥ z − 1) + (1 −β) y−1∑ z=1

P(U(y − 1) ≥ z).

We now use the fact that E[min(U(y − 1),y − 1)] = ∑y−1 z=1 P(U(y − 1) ≥ z), to express the

difference as

E[min(U(y),y)] −E[min(U(y − 1),y − 1)] = P(U(y) ≥ y) + βP (U(y − 1) < y − 1).

Since P(U(y) ≥ y) = βP (U(y − 1) ≥ y − 1) + (1 −β)P(U(y − 1) ≥ y), we conclude that

E[min(U(y),y)]−E[min(U(y−1),y−1)] = [β+(1−β)P(U(y−1) ≥ y|D2 > c−y)]P(D2 > c−y).

On the other hand, E min(c−y,D2) −E min(c + 1 −y,D2)] = −P(D2 > c−y). Therefore,

∆W2(c,y) = [q1(β + (1 −β)P(U(y − 1) ≥ y|D2 > c−y) − q2)]P(D2 > c−y)

as claimed.

References

[1] Belobaba, P. P., and L.R. Weatherford. (1996) “Comparing Decision Rules that Incorpo- rate Customer Diversion in Perishable Asset Revenue Management Situations.” Decision Sciences, 27, 343-363.

[2] Blanchet, J, Gallego, G., and V. Goyal (2013) “A Markov Chain Approximation to Discrete Choice Modeling.” Working paper, Columbia University, New York, NY.

[3] Brumelle, S. L., J. I. McGill, T. H. Oum, M. W. Tretheway and K. Sawaki. (1990) “Allocation of Airline Seats Between Stochastically Dependent Demands,” Transporation Science, 24, 183-192.

[4] Cooper, W., Homem de Mello, T., and Kleywegt, A. (2006) “Models of the spiral-down effect in revenue management.” (Operations Research, 54, 968-987.

[5] Domencich, T. A., and D. McFadden (1975) “Urban Travel Demand : A behavioral analysis,” North Holland Publishing Co., Amsterdam.

[6] Fiig, Thomas, Isler, K., Hopperstad, C., and P. Belobaba. 2010. “Optimization of Mixed Fare Structures: Theory and Applications.” Journal of Revenue and Pricing Manage- ment. 9, 152-170.

[7] Gallego, G., L. Lin and R. Ratliff (2009) “Choice Based EMSR methods for single-resource revenue management with demand dependencies.” Journal of Revenue and Pricing Man- agement, 8, 207-240.

[8] Gallego, G., L. Lin and R. Ratliff (2009) “Demand Arrival Order and RM Controls.” AGIFORS Cargo and RM Study Group

[9] Gallego, G., Ratliff, R., and S. Shebalov (2010) “A General Attraction Model and an Efficient Formulation for the Network Revenue Management Problem.” Submitted to Operations Research.

[Green(1984)] Green, P. J. 1984. Journal of the Royal Statistical Society. Series B, 46, 149-192.

[Greene(1984)] Greene, W.H. 2000. Econometric Analysis. Fourth edition. London: Prentice Hall.

[10] Kincaid, W.M. and Darling, D.A. (1963) “An inventory pricing problem.” Journal of Mathematical Analysis and Applications 7: 183208.

[11] Lee. C. T. and M. Hersh. (1993) “A Model for Dynamic Airline Seat Inventory Control with Multiple Seat Bookings,” Transportation Science, 27, 252–265.

[12] Luce, R. D. (1959). “Individual Choice Behavior: A Theoretical Analysis.” New York: Wiley

[13] Luce, R.D. (1977) ”The Choice Axiom after Twenty Years.” Journal of Mathematical Psychology, 15, pp. 215-233.

[14] Marshall, A. W. and Olkin, I. (1979). Inequalities: Theory of Majorization and Its Ap- plications. Academic Press, New York, NY.

[15] McFadden, D. (1978) . Modelling the choice of residential location, in A. Karlquist, ed., Spatial Interaction Theory and Planning Models, North Holland Publishing Co., Amsterdam, chapter 25, pp. 7596.

[16] McFadden, D. (1980). “Econometric models for probabilistic choice among products.” The Journal of Business. 53, 13-29.

[17] McFadden, D., K. Train. (2000). “Mixed mnl models for discrete response.” Journal of applied Econometrics. 15, 447470.

[18] Rusmevichientong, P., Shmoys, D. and H. Topaloglu. (2010). “Assortment Optimization with Mixtures of Logits.” Working paper, Cornell University.

[19] Talluri, K and G. van Ryzin. (2004) “Revenue Management Under a General Discrete Choice Model of Consumer Behavior.” Management Science, 50, 15-33.

[Train(2002)] Train, K. 2002. Discrete Choice Methods with Simulation. Cambridge Univer- sity Press.

[20] Walczack, D., S. Mardan, and Kallesen, R. 2010. “Customer voice, fare adjustments and the marginal expected revenue data transformation: A note on using old yield manage- ment techniques in the brave new world of pricing.” Journal of Revenue and Pricing Management, 9, 94-109.