question 1,4,5,6

profilekimieybibz
probability_review_and_intro_to_dynamic_programmin.pdf

Probability

and Dynamic Programming

Review

for

Dynamic Pricing and Revenue

Management

Probability Review

• Poisson: http://en.wikipedia.org/wiki/Poisson_random_variable

Read Sections: 1, 2, 3,4

• Compound Poisson:

http://en.wikipedia.org/wiki/Compound_Poisson_distribution

• Poisson Process: http://en.wikipedia.org/wiki/Poisson_process

• Compound Poisson Process:

http://en.wikipedia.org/wiki/Compound_Poisson_process

• Normal: http://en.wikipedia.org/wiki/Normal_random_variable

Read Sections 1,2.1,2.2,3,1,3.2,3.3

• Brownian Motion: http://en.wikipedia.org/wiki/Wiener_process

Read sections 1, 2.

DP AS AN OPTIMIZATION METHODOLOGY

• Basic optimization problem

min g( u)

u∈U

where u is the optimization/decision variable, g( u) is the cost function, and U is the constraint set

• Categories of problems:

− Discrete ( U is finite) or continuous

− Linear ( g is linear and U is polyhedral) or nonlinear

− Stochastic or deterministic: In stochastic problems the cost involves a stochastic parameter

w, which is averaged, i.e., it has the form g( u) = Ew G( u, w) where w is a random parameter.

• DP can deal with complex stochastic problems where information about w becomes available in stages, and the decisions are also made in stages and make use of this information.

INVENTORY CONTROL EXAMPLE

w

Demand at Period k

k

Stock at Period k

Stock at Period k + 1

Inventory

xk

S y s t e m

xk + 1 = xk + uk - wk

Stock Ordered at

Period k

Co s t o f P e rio d k

u k

c uk + r (xk + uk - wk)

• Discrete-time system

xk+1 = fk( xk, uk, wk) = xk + uk − wk

• Cost function that is additive over time N − 1

E

gN ( xN ) +

gk( xk, uk, wk)

k=0

N − 1

= E

cuk + r( xk + uk − wk) k=0

• Optimization over policies: Rules/functions uk =

µk( xk) that map states to controls BASIC STRUCTURE OF STOCHASTIC DP

• Discrete-time system

xk+1 = fk( xk, uk, wk) , k = 0 , 1 , . . . , N − 1

− k: Discrete time

− xk: State; summarizes past information that is relevant for future optimization

− uk: Control; decision to be selected at time k from a given set

− wk: Random parameter (also called distur-bance or noise depending on the context)

− N: Horizon or number of times control is applied

• Cost function that is additive over time N − 1

E

gN ( xN ) +

gk( xk, uk, wk)

k=0

BASIC PROBLEM

• System xk+1 = fk( xk, uk, wk), k = 0 , . . . , N − 1

• Control constraints uk ∈ U( xk)

• Probability distribution Pk( · | xk, uk) of wk

• Policies π = {µ 0 , . . . , µN− 1 }, where µk maps states xk into controls uk = µk( xk) and is such that µk( xk) ∈ Uk( xk) for all xk

• Expected cost of π starting at x 0 is N − 1

Jπ( x 0) = E gN ( xN ) +

gk( xk, µk( xk) , wk) k=0

• Optimal cost function

J ∗( x 0) = min Jπ( x 0) π

• Optimal policy π∗ is one that satisfies Jπ∗( x 0) = J∗( x 0) PRINCIPLE OF OPTIMALITY

• Let π∗ = {µ∗ 0 , µ∗ 1 , . . . , µ∗N− 1 } be an optimal policy

• Consider the “tail subproblem” whereby we are at xi at time i and wish to minimize the “cost-to-go” from time i to time N

N − 1

E

gN ( xN ) +

gk xk, µk( xk) , wk

k= i

and the “tail policy” {µ∗, µ∗

i

i+1 , . . . , µ∗

N − 1 }

xi

Tail Subproblem

0

i

N

• Principle of optimality: The tail policy is optimal for the tail subproblem

• DP first solves ALL tail subroblems of final stage

• At the generic step, it solves ALL tail subproblems of a given time length, using the solution of the tail subproblems of shorter time length

DP ALGORITHM

• Start with

JN ( xN ) = gN ( xN ) , and go backwards using

Jk( xk) =

min

E gk( xk, uk, wk)

uk∈Uk( xk) wk

+ Jk+1 fk( xk, uk, wk) , k = 0 , 1 , . . . , N − 1 .

• Then J 0( x 0), generated at the last step, is equal to the optimal cost J ∗( x 0). Also, the policy π∗ = {µ∗ 0 , . . . , µ∗N− 1 }

where µ∗( x

k

k) minimizes in the right side above for

each xk and k, is optimal.

• Justification: Proof by induction that Jk( xk) is equal to J ∗( x

k

k), defined as the optimal cost of

the tail subproblem that starts at time k at state xk.

• Note that ALL the tail subproblems are solved in addition to the original problem, and the inten-sive computational requirements.

DETERMINISTIC FINITE-STATE PROBLEM

Terminal Arcs

with Cost Equal

to Terminal Cost

. . .

t

Artificial Terminal

. . .

N o d e

Initial State

s

. . .

Stage 0

Stage 1

Stage 2

. . . Stage N - 1

Stage N

• States < == > Nodes

• Controls < == > Arcs

• Control sequences (open-loop) < == > paths from initial state to terminal states

• ak : Cost of transition from state i ∈ Sk to state ij

j ∈ Sk+1 at time k (view it as “length” of the arc)

• aN : Terminal cost of state i ∈ S

it

N

• Cost of control sequence < == > Cost of the cor​

responding path (view it as “length” of the path) BACKWARD AND FORWARD DP ALGORITHMS

• DP algorithm:

JN ( i) = aN , i ∈ S

it

N ,

Jk( i) = min ak + Jk+1( j) , i ∈ Sk, k = 0 , . . . , N − 1 .

j∈S

ij

k+1

The optimal cost is J 0( s) and is equal to the length of the shortest path from s to t.

• Observation: An optimal path s → t is also an optimal path t → s in a “reverse” shortest path problem where the direction of each arc is reversed and its length is left unchanged.

• Forward DP algorithm (= backward DP algorithm for the reverse problem):

˜

JN ( j) = a 0 , j ∈ S

sj

1 ,

˜

Jk( j) = min aN−k + ˜

Jk+1( i) , j ∈ SN−k+1

i∈S

ij

N−k

The optimal cost is ˜

J 0( t) = min i∈S

aN + ˜

J

N

it

1( i) .

• View ˜

Jk( j) as optimal cost-to-arrive to state j from initial state s.

A NOTE ON FORWARD DP ALGORITHMS

• There is no forward DP algorithm for stochastic problems.

• Mathematically, for stochastic problems, we cannot restrict ourselves to open-loop sequences, so the shortest path viewpoint fails.

• Conceptually, in the presence of uncertainty, the concept of “optimal-cost-to-arrive” at a state xk does not make sense. The reason is that it may be impossible to guarantee (with prob. 1) that any given state can be reached.

• By contrast, even in stochastic problems, the concept of “optimal cost-to-go” from any state xk makes clear sense.

GENERIC SHORTEST PATH PROBLEMS

• { 1 , 2 , . . . , N, t}: nodes of a graph ( t: the desti-nation)

• aij: cost of moving from node i to node j

• Find a shortest (minimum cost) path from each node i to node t

• Assumption: All cycles have nonnegative length.

Then an optimal path need not take more than N

moves

• We formulate the problem as one where we require exactly N moves but allow degenerate moves from a node i to itself with cost aii = 0.

Jk( i) = optimal cost of getting from i to t in N −k moves.

J 0( i): Cost of the optimal path from i to t.

• DP algorithm:

Jk( i) = min

aij+ Jk+1( j) ,

k = 0 , 1 , . . . , N − 2 , j=1 ,...,N

with JN− 1( i) = ait, i = 1 , 2 , . . . , N.

EXAMPLE

State i

Destination

5

5

3

3

3

3

2

3

4

7

5

4

4

4

5

1

4

3

2

5

5

4.5

4.5

5.5

7

2

6

1

2

2

2

2

1

2

3

0 . 5

0

1

2

3

4

Stage k

(a)

(b)

JN− 1( i) = ait,

i = 1 , 2 , . . . , N,

Jk( i) = min

aij+ Jk+1( j) ,

k = 0 , 1 , . . . , N − 2 .

j=1 ,...,N

Document Outline Probability and Dynamic ProgrammingReview Probability Review

  • Probability and Dynamic ProgrammingReview
  • Probability Review