queation 1,4,5,6

profilephilxu1988
probability_review_and_intro_to_dynamic_programming.pdf

Probability and Dynamic Programming

Review for

Dynamic Pricing and Revenue Management

Probability Review • Poisson: http://en.wikipedia.org/wiki/Poisson_random_variable

– Read Sections: 1, 2, 3,4

• Compound Poisson: http://en.wikipedia.org/wiki/Compound_Poisson_distribution

• Poisson Process: http://en.wikipedia.org/wiki/Poisson_process • Compound Poisson Process:

http://en.wikipedia.org/wiki/Compound_Poisson_process

• Normal: http://en.wikipedia.org/wiki/Normal_random_variable – Read Sections 1,2.1,2.2,3,1,3.2,3.3

• Brownian Motion: http://en.wikipedia.org/wiki/Wiener_process – Read sections 1, 2.

DP AS AN OPTIMIZATION METHODOLOGY

• Basic optimization problem

min u∈U

g(u)

where u is the optimization/decision variable, g(u) is the cost function, and U is the constraint set

• Categories of problems: − Discrete (U is finite) or continuous − Linear (g is linear and U is polyhedral) or

nonlinear − Stochastic or deterministic: In stochastic prob-

lems the cost involves a stochastic parameter w, which is averaged, i.e., it has the form

g(u) = Ew { G(u, w)

} where w is a random parameter.

• DP can deal with complex stochastic problems where information about w becomes available in stages, and the decisions are also made in stages and make use of this information.

{ } ∑ { } ∑ ( )

INVENTORY CONTROL EXAMPLE

Inventory S y s t e m

Stock Ordered at Period k

Stock at Period k Stock at Period k + 1

Demand at Period k

xk

wk

xk + 1 = xk + uk - wk

u k Co s t o f P e rio d k

c uk + r (xk + uk - wk)

• Discrete-time system

xk+1 = fk(xk, uk, wk) = xk + uk − wk • Cost function that is additive over time

N−1 E gN (xN ) + gk(xk, uk, wk)

k=0

N−1 = E cuk + r(xk + uk − wk)

k=0

• Optimization over policies: Rules/functions uk = µk(xk) that map states to controls

BASIC STRUCTURE OF STOCHASTIC DP

• Discrete-time system

xk+1 = fk(xk, uk, wk), k = 0, 1, . . . , N − 1

− k: Discrete time − xk: State; summarizes past information that

is relevant for future optimization − uk: Control; decision to be selected at time

k from a given set − wk: Random parameter (also called distur-

bance or noise depending on the context) − N : Horizon or number of times control is

applied

• Cost function that is additive over time

E

{ gN (xN ) +

N−1∑ k=0

gk(xk, uk, wk)

}

Guillermo
Pencil

BASIC PROBLEM

• System xk+1 = fk(xk, uk, wk), k = 0, . . . , N −1 • Control constraints uk ∈ U (xk) • Probability distribution Pk(· | xk, uk) of wk • Policies π = {µ0, . . . , µN−1}, where µk maps states xk into controls uk = µk(xk) and is such that µk(xk) ∈ Uk(xk) for all xk • Expected cost of π starting at x0 is

Jπ(x0) = E

{ gN (xN ) +

N−1∑ k=0

gk(xk, µk(xk), wk)

}

• Optimal cost function

J∗(x0) = min π

Jπ(x0)

• Optimal policy π∗ is one that satisfies

Jπ∗ (x0) = J∗(x0)

PRINCIPLE OF OPTIMALITY

• Let π∗ = {µ∗0, µ∗1, . . . , µ∗N−1} be an optimal pol- icy

• Consider the “tail subproblem” whereby we are at xi at time i and wish to minimize the “cost-to- go” from time i to time N

E

{ gN (xN ) +

N−1∑ k=i

gk ( xk, µk(xk), wk

)}

and the “tail policy” {µ∗i , µ∗i+1, . . . , µ∗N−1}

0 Ni

xi Tail Subproblem

• Principle of optimality: The tail policy is opti- mal for the tail subproblem

• DP first solves ALL tail subroblems of final stage

• At the generic step, it solves ALL tail subprob- lems of a given time length, using the solution of the tail subproblems of shorter time length

DP ALGORITHM

• Start with

JN (xN ) = gN (xN ),

and go backwards using

Jk(xk) = min uk∈Uk(xk)

E wk

{ gk(xk, uk, wk)

+ Jk+1 ( fk(xk, uk, wk)

)} , k = 0, 1, . . . , N − 1.

• Then J0(x0), generated at the last step, is equal to the optimal cost J∗(x0). Also, the policy

π∗ = {µ∗0, . . . , µ∗N−1} where µ∗k(xk) minimizes in the right side above for each xk and k, is optimal.

• Justification: Proof by induction that Jk(xk) is equal to J∗k (xk), defined as the optimal cost of the tail subproblem that starts at time k at state xk.

• Note that ALL the tail subproblems are solved in addition to the original problem, and the inten- sive computational requirements.

DETERMINISTIC FINITE-STATE PROBLEM

Terminal Arcs

. . .

. . .

. . .

Initial State s

t Artificial Terminal N o d e

with Cost Equal to Terminal Cost

S t a g e 0 S t a g e 1 S t a g e 2 . . . S t a g e N - 1 S t a g e N

• States <==> Nodes • Controls <==> Arcs • Control sequences (open-loop) <==> paths from initial state to terminal states

• ak : Cost of transition from state i ∈ Sk to state ij j ∈ Sk+1 at time k (view it as “length” of the arc) • aN : Terminal cost of state i ∈ SNit • Cost of control sequence <==> Cost of the cor­ responding path (view it as “length” of the path)

BACKWARD AND FORWARD DP ALGORITHMS

• DP algorithm: JN (i) = aNit , i ∈ SN ,

Jk(i) = min j∈Sk+1

[ akij +Jk+1(j)

] , i ∈ Sk, k = 0, . . . , N−1.

The optimal cost is J0(s) and is equal to the length of the shortest path from s to t.

• Observation: An optimal path s → t is also an optimal path t → s in a “reverse” shortest path problem where the direction of each arc is reversed and its length is left unchanged.

• Forward DP algorithm (= backward DP algo- rithm for the reverse problem):

J̃N (j) = a0sj , j ∈ S1, J̃k(j) = min

i∈SN−k

[ aN−kij + J̃k+1(i)

] , j ∈ SN−k+1

The optimal cost is J̃0(t) = mini∈SN [ aNit + J̃1(i)

] .

• View J̃k(j) as optimal cost-to-arrive to state j from initial state s.

A NOTE ON FORWARD DP ALGORITHMS

• There is no forward DP algorithm for stochastic problems.

• Mathematically, for stochastic problems, we cannot restrict ourselves to open-loop sequences, so the shortest path viewpoint fails.

• Conceptually, in the presence of uncertainty, the concept of “optimal-cost-to-arrive” at a state xk does not make sense. The reason is that it may be impossible to guarantee (with prob. 1) that any given state can be reached.

• By contrast, even in stochastic problems, the concept of “optimal cost-to-go” from any state xk makes clear sense.

GENERIC SHORTEST PATH PROBLEMS

• {1, 2, . . . , N, t}: nodes of a graph (t: the desti- nation)

• aij : cost of moving from node i to node j • Find a shortest (minimum cost) path from each node i to node t

• Assumption: All cycles have nonnegative length. Then an optimal path need not take more than N moves

• We formulate the problem as one where we re- quire exactly N moves but allow degenerate moves from a node i to itself with cost aii = 0.

Jk(i) = optimal cost of getting from i to t in N−k moves.

J0(i): Cost of the optimal path from i to t.

• DP algorithm: Jk(i) = min

j=1,...,N

[ aij +Jk+1(j)

] , k = 0, 1, . . . , N−2,

with JN−1(i) = ait, i = 1, 2, . . . , N.

EXAMPLE

2 7 5

2 5 5

6 1

3

0 . 5 3

1

2

4

0 1 2 3 4

1

2

3

4

5

State i

S t a g e k

3 3 3 3

4 4 4 5

4 . 5 4 . 5 5 . 5 7

2 2 2 2

Destination 5

(a) (b)

JN−1(i) = ait, i = 1, 2, . . . , N,

Jk(i) = min j=1,...,N

[ aij +Jk+1(j)

] , k = 0, 1, . . . , N−2.

  • Probability and Dynamic ProgrammingReview
  • Probability Review