question 1,4,5,6
Probability
and Dynamic Programming
Review
for
Dynamic Pricing and Revenue
Management
Probability Review
• Poisson: http://en.wikipedia.org/wiki/Poisson_random_variable
–
Read Sections: 1, 2, 3,4
• Compound Poisson:
http://en.wikipedia.org/wiki/Compound_Poisson_distribution
• Poisson Process: http://en.wikipedia.org/wiki/Poisson_process
• Compound Poisson Process:
http://en.wikipedia.org/wiki/Compound_Poisson_process
• Normal: http://en.wikipedia.org/wiki/Normal_random_variable
–
Read Sections 1,2.1,2.2,3,1,3.2,3.3
• Brownian Motion: http://en.wikipedia.org/wiki/Wiener_process
–
Read sections 1, 2.
DP AS AN OPTIMIZATION METHODOLOGY
• Basic optimization problem
min g( u)
u∈U
where u is the optimization/decision variable, g( u) is the cost function, and U is the constraint set
• Categories of problems:
− Discrete ( U is finite) or continuous
− Linear ( g is linear and U is polyhedral) or nonlinear
− Stochastic or deterministic: In stochastic problems the cost involves a stochastic parameter
w, which is averaged, i.e., it has the form g( u) = Ew G( u, w) where w is a random parameter.
• DP can deal with complex stochastic problems where information about w becomes available in stages, and the decisions are also made in stages and make use of this information.
INVENTORY CONTROL EXAMPLE
w
Demand at Period k
k
Stock at Period k
Stock at Period k + 1
Inventory
xk
S y s t e m
xk + 1 = xk + uk - wk
Stock Ordered at
Period k
Co s t o f P e rio d k
u k
c uk + r (xk + uk - wk)
• Discrete-time system
xk+1 = fk( xk, uk, wk) = xk + uk − wk
• Cost function that is additive over time N − 1
E
gN ( xN ) +
gk( xk, uk, wk)
k=0
N − 1
= E
cuk + r( xk + uk − wk) k=0
• Optimization over policies: Rules/functions uk =
µk( xk) that map states to controls BASIC STRUCTURE OF STOCHASTIC DP
• Discrete-time system
xk+1 = fk( xk, uk, wk) , k = 0 , 1 , . . . , N − 1
− k: Discrete time
− xk: State; summarizes past information that is relevant for future optimization
− uk: Control; decision to be selected at time k from a given set
− wk: Random parameter (also called distur-bance or noise depending on the context)
− N: Horizon or number of times control is applied
• Cost function that is additive over time N − 1
E
gN ( xN ) +
gk( xk, uk, wk)
k=0
BASIC PROBLEM
• System xk+1 = fk( xk, uk, wk), k = 0 , . . . , N − 1
• Control constraints uk ∈ U( xk)
• Probability distribution Pk( · | xk, uk) of wk
• Policies π = {µ 0 , . . . , µN− 1 }, where µk maps states xk into controls uk = µk( xk) and is such that µk( xk) ∈ Uk( xk) for all xk
• Expected cost of π starting at x 0 is N − 1
Jπ( x 0) = E gN ( xN ) +
gk( xk, µk( xk) , wk) k=0
• Optimal cost function
J ∗( x 0) = min Jπ( x 0) π
• Optimal policy π∗ is one that satisfies Jπ∗( x 0) = J∗( x 0) PRINCIPLE OF OPTIMALITY
• Let π∗ = {µ∗ 0 , µ∗ 1 , . . . , µ∗N− 1 } be an optimal policy
• Consider the “tail subproblem” whereby we are at xi at time i and wish to minimize the “cost-to-go” from time i to time N
N − 1
E
gN ( xN ) +
gk xk, µk( xk) , wk
k= i
and the “tail policy” {µ∗, µ∗
i
i+1 , . . . , µ∗
N − 1 }
xi
Tail Subproblem
0
i
N
• Principle of optimality: The tail policy is optimal for the tail subproblem
• DP first solves ALL tail subroblems of final stage
• At the generic step, it solves ALL tail subproblems of a given time length, using the solution of the tail subproblems of shorter time length
DP ALGORITHM
• Start with
JN ( xN ) = gN ( xN ) , and go backwards using
Jk( xk) =
min
E gk( xk, uk, wk)
uk∈Uk( xk) wk
+ Jk+1 fk( xk, uk, wk) , k = 0 , 1 , . . . , N − 1 .
• Then J 0( x 0), generated at the last step, is equal to the optimal cost J ∗( x 0). Also, the policy π∗ = {µ∗ 0 , . . . , µ∗N− 1 }
where µ∗( x
k
k) minimizes in the right side above for
each xk and k, is optimal.
• Justification: Proof by induction that Jk( xk) is equal to J ∗( x
k
k), defined as the optimal cost of
the tail subproblem that starts at time k at state xk.
• Note that ALL the tail subproblems are solved in addition to the original problem, and the inten-sive computational requirements.
DETERMINISTIC FINITE-STATE PROBLEM
Terminal Arcs
with Cost Equal
to Terminal Cost
. . .
t
Artificial Terminal
. . .
N o d e
Initial State
s
. . .
Stage 0
Stage 1
Stage 2
. . . Stage N - 1
Stage N
• States < == > Nodes
• Controls < == > Arcs
• Control sequences (open-loop) < == > paths from initial state to terminal states
• ak : Cost of transition from state i ∈ Sk to state ij
j ∈ Sk+1 at time k (view it as “length” of the arc)
• aN : Terminal cost of state i ∈ S
it
N
• Cost of control sequence < == > Cost of the cor
responding path (view it as “length” of the path) BACKWARD AND FORWARD DP ALGORITHMS
• DP algorithm:
JN ( i) = aN , i ∈ S
it
N ,
Jk( i) = min ak + Jk+1( j) , i ∈ Sk, k = 0 , . . . , N − 1 .
j∈S
ij
k+1
The optimal cost is J 0( s) and is equal to the length of the shortest path from s to t.
• Observation: An optimal path s → t is also an optimal path t → s in a “reverse” shortest path problem where the direction of each arc is reversed and its length is left unchanged.
• Forward DP algorithm (= backward DP algorithm for the reverse problem):
˜
JN ( j) = a 0 , j ∈ S
sj
1 ,
˜
Jk( j) = min aN−k + ˜
Jk+1( i) , j ∈ SN−k+1
i∈S
ij
N−k
The optimal cost is ˜
J 0( t) = min i∈S
aN + ˜
J
N
it
1( i) .
• View ˜
Jk( j) as optimal cost-to-arrive to state j from initial state s.
A NOTE ON FORWARD DP ALGORITHMS
• There is no forward DP algorithm for stochastic problems.
• Mathematically, for stochastic problems, we cannot restrict ourselves to open-loop sequences, so the shortest path viewpoint fails.
• Conceptually, in the presence of uncertainty, the concept of “optimal-cost-to-arrive” at a state xk does not make sense. The reason is that it may be impossible to guarantee (with prob. 1) that any given state can be reached.
• By contrast, even in stochastic problems, the concept of “optimal cost-to-go” from any state xk makes clear sense.
GENERIC SHORTEST PATH PROBLEMS
• { 1 , 2 , . . . , N, t}: nodes of a graph ( t: the desti-nation)
• aij: cost of moving from node i to node j
• Find a shortest (minimum cost) path from each node i to node t
• Assumption: All cycles have nonnegative length.
Then an optimal path need not take more than N
moves
• We formulate the problem as one where we require exactly N moves but allow degenerate moves from a node i to itself with cost aii = 0.
Jk( i) = optimal cost of getting from i to t in N −k moves.
J 0( i): Cost of the optimal path from i to t.
• DP algorithm:
Jk( i) = min
aij+ Jk+1( j) ,
k = 0 , 1 , . . . , N − 2 , j=1 ,...,N
with JN− 1( i) = ait, i = 1 , 2 , . . . , N.
EXAMPLE
State i
Destination
5
5
3
3
3
3
2
3
4
7
5
4
4
4
5
1
4
3
2
5
5
4.5
4.5
5.5
7
2
6
1
2
2
2
2
1
2
3
0 . 5
0
1
2
3
4
Stage k
(a)
(b)
JN− 1( i) = ait,
i = 1 , 2 , . . . , N,
Jk( i) = min
aij+ Jk+1( j) ,
k = 0 , 1 , . . . , N − 2 .
j=1 ,...,N
Document Outline Probability and Dynamic ProgrammingReview Probability Review
- Probability and Dynamic ProgrammingReview
- Probability Review