Game Theory

profilemottcr
hw411.pdf

1

Problem 1. Write code in Python (preferred computational toolkit: numPy, SciPy, iPython, Matplotlib,

CVXPY, and the Control System Toolbox) to implement for generic 2 player finite (matrix) games of the form (A1,A2) that will take in different learning rules for each player and pit them against each other. We know that some will not converge to Nash but that is fine. That is, your code should take in a game of the form (A1,A2) and adaptively update the mixed strategies of the players. In this case, players have finite action spaces Xi and expected costs

where σi ∈ ∆(Xi)—i.e. σi is a mixed policy on Xi. Implement the

following update rules:

1. gradient play (this only needs to work for no more than 2 strategies per player, if a game is

passed in that has more than 2 strategies, return an error)

2. fictitious play

3. best response

4. replicator dynamics where in each round agents draw their actions from the current

distribution on their actions

5. epsilon greedy with parameter εi that is passed in: at each iteration, after drawing random

variable rt ∼ Uniform[0,1.0), then if rt > εi, player i chooses action

where vit(x) is the empirical reward or value of action x ∈ Xi as computed up to time t, otherwise

xti+1 is draw uniformly at random. Ties are broken by choosing uniformly at random.

6. Upper Confidence Bound (this is an optimism in the face of uncertainty policy): at each iteration,

the player chooses an action by selecting the action that has the maximum upper confidence—

i.e. empirical reward plus confidence window:

where tx is the number of times action x has been chosen up to time t. Ties are broken by

choosing uniformly at random.

7. extra credit: fictitious play with the player has an arbitrary belief structure (some distribution

type) for a prior that is updated at each round by computing the posterior given the observed

actions at each round

2

Problem 2.b. Create a test suite that pits each learning rule against one another including the learning

rule against itself.

1. For each game given below, provide plots of the empirical reward distribution for the different

combinations in one round of the tournament between FP, replicator, 𝜖-greedy, UCB for each

player and plots of the empirical action distributions for each player.

2. Repeat many of these rounds in a tournament. For each algorithm pair, across the rounds in the

tournament, provide plots of the sample (learning) paths for each algorithm type, the mean and

±1-std dev of the sample paths for each algorithm type.

3. For the games with less than or equal to 2 strategies per player, show the sample (learning)

paths for gradient play against gradient play, gradient play against best response, and best

response against best response.

4. Analyze the behavior you observe. Do not just provide the plots. Interpret them. What is

happening? Is there a general trend.

5. Add noise to the game matrices A1, A2—that is, generate a small noise parameter that is added

to the entries of the matrices. How does this change the outcomes in the tournament? And the

convergence properties? For example, do any of the algorithm pairs converge to Nash? Do they

converge to the cooperative solution?

6. Create a function to randomly generate two player matrix games. Run the tournament on some

randomly generated matrix games. Can you draw any conclusions about the performance of

any of these algorithms?

Games:

• Coordination game

• Matching Pennies

• Roshambo

• Prisoners Dilemma