博弈论

Two-Person Zero-Sum Games

Strategic Form of 2-person: The strategic form, or normal form, of a two-person zero-sum game is given by a triplet ( X , Y , A ) (X, Y, A) (X,Y,A), where

  1. X X X is a nonempty set, the set of strategies of Player I
  2. Y Y Y is a nonempty set, the set of strategies of Player II
  3. A A A is a real-valued function defined on X × Y X \times Y X×Y. (Thus, A ( x , y ) A (x, y) A(x,y) is a real number for every x ∈ X x \in X xX and every y ∈ Y y \in Y yY)

The interpretation is as follows. Simultaneously, Player I chooses x ∈ X x \in X xX and Player II chooses y ∈ Y y \in Y yY, each unaware of the choice of the other. Then their choices are made known and I wins the amount A ( x , y ) A (x, y) A(x,y) from II.

Matrix Games : A finite two-person zero-sum game in strategic form, ( X , Y , A ) (X, Y, A) (X,Y,A), is sometimes called a matrix game because the essential information of the game, the payoff function A, can be represented by a matrix.
If X = { x 1 , . . . , x m } X =\{x_1 , . . . , x_m\} X={x1,...,xm} and Y = { y 1 , . . . , y n } Y = \{y_1, . . . , y_n\} Y={y1,...,yn}, then by the game matrix or payoff matrix we mean the matrix where a i j = A ( x i , y j ) a_{ij} = A(x_i, y_j) aij=A(xi,yj)

Removing Dominated Strategies: We say the i th row of a matrix A = ( a i j ) A = (a_{ij} ) A=(aij) dominates the k th row if a i j ≥ a k j a_{ij} \geq a_{kj} aijakj for all j. We say the i th row of A strictly dominates the k th row if a i j > a k j a_{ij} > a_{kj} aij>akj for all j. Similarly, the j th column of A dominates (strictly dominates)the k th column if a i j ≤ a i k a_{ij} \leq a_{ik} aijaik (resp. a i j < a i k a_{ij} < a_{ik} aij<aik ) for all i.

Best Response: Let ( X , Y , A ) (X, Y, A) (X,Y,A) be a matrix game. Let x ∈ X x \in X xX be a strategy of Player I. Then, y ∈ Y y \in Y yY is called a Best Response to x x x if A ( x , y ) ≤ A ( x , y ’ ) A(x, y) \leq A (x, y’ ) A(x,y)A(x,y) for any y ’ ∈ Y y’ \in Y yY. Equivalently, Let y ∈ Y y \in Y yY be a strategy of Player II. Then, x ∈ X x \in X xX is called a Best Response to y y y if A ( x , y ) ≥ A ( x ’ , y ) A(x, y) \geq A (x’, y) A(x,y)A(x,y) for any x ’ ∈ X x’ \in X xX.

We may apply the following two principles to analyze the game.

  1. Equilibrium Principle: Best Responses to each other
    This principle involves the interactions of the players.
  2. Maximin Principle: Safety First
    Under this principle, each player only concerns his/her own payoff.

Saddle points: If some entry a i j a_{ij} aij of the matrix A A A has the property that

  1. a i j a_{ij} aij is the minimum of the i i i th row,
  2. a i j a_{ij} aij is the maximum of the j j j th column,

then we say a i j a_{ij} aij is a saddle point. If a i j a_{ij} aij is a saddle point, then Player I can then win at least a i j a_{ij} aij by choosing row i, and Player II can keep her loss to at most a i j a_{ij} aij by choosing column j. <Row i, Column j> is then an equilibrium pair, i.e. BR to each other.

Maximin Principle means to find the “risk” for each strategy and then find the strategy, called the safety strategy, with minimum risk.

Suppose A = ( − 2 3 3 − 4 ) A=

(2334)
A=(2334) If Player I uses Row 1, the worst payoff is -2. If Player I uses Row 2, the worst payoff is -4. Therefore, the “best” of the worst case is -2. max ⁡ i min ⁡ j A ( x i , x j ) = − 2 \max_i\min_j A(x_i,x_j)=-2 maximinjA(xi,xj)=2

Mixed Strategies : Consider a finite 2-person zero-sum game, ( X , Y , A ) (X, Y, A) (X,Y,A), with m × n m\times n m×n matrix, A A A. Let us take the strategy space X X X to be the first m m m integers, X = { 1 , 2 , . . . , m } X = \{1, 2, . . .,m\} X={1,2,...,m}, and similarly, Y = { 1 , 2 , . . . , n } Y = \{1, 2, . . . , n\} Y={1,2,...,n}. A mixed strategy for Player I may be represented by a column vector, ( p 1 , p 2 , . . . , p m ) T (p_1 , p_2 , . . . , p_m )^T (p1,p2,...,pm)T of probabilities that add up to 1. Similarly, a mixed strategy for Player II is an n-tuple q = ( q 1 , q 2 , . . . , q n ) T q = (q_1 , q_2 , . . . , q_n )^T q=(q1,q2,...,qn)T.

The sets of mixed strategies of players I and II will be denoted respectively by X ∗ , Y ∗ X^*, Y^* X,Y.

X ∗ = { p = ( p 1 , . . . , p m ) T ∣ p i ≥ 0 , f o r      i = 1 , . . . , m      a n d      p 1 + . . . + p m = 1 } X^* = \{p = (p_1 , . . . , p_m )^T | p_i \geq 0, for \;\;i = 1, . . . , m \;\;and \;\;p_1 +...+ p_m =1\} X={p=(p1,...,pm)Tpi0,fori=1,...,mandp1+...+pm=1}

Y ∗ = { q = ( q 1 , . . . , q n ) T ∣ q j ≥ 0 , f o r      j = 1 , . . . , n      a n d      q 1 + . . . + q n = 1 } Y^* = \{q = (q1 , . . . , qn )^T | q_j \geq 0, for \;\;j = 1, . . . , n \;\;and \;\; q_1 +...+ q_n =1\} Y={q=(q1,...,qn)Tqj0,forj=1,...,nandq1+...+qn=1}

p = ( p 1 , . . . , p m ) T p = (p_1, . . . , p_m )^T p=(p1,...,pm)T means playing Row 1 with probability p 1 p_1 p1, playing Row 2 with probability p 2 p_2 p2 ,…, playing Row m with probability p m p_m pm.

The m-dimensional unit vector e k ∈ X ∗ e_k \in X^* ekX with a one at the kth component and zeros elsewhere may be identified with the pure strategy of choosing row k.

We could if we like consider the game ( X , Y , A ) (X, Y, A) (X,Y,A) in which the players are allowed to use mixed strategies as a new game ( X ∗ , Y ∗ , A ) (X^*, Y^*,A) (X,Y,A), where A ( p , q ) = p T A q = ∑ i ∑ j p i a i j q j A(p, q) = p^T Aq = \sum_i\sum_j p_i a_{ij} q_j A(p,q)=pTAq=ijpiaijqj

Utility Assumption : The basic premise of utility theory is that one should evaluate a payoff by its utility to the player rather than on its numerical monetary value.

Safety Strategies :For each p ∈ X ∗ p \in X^* pX , the worst payoff is min ⁡ q ∈ Y ∗ p T A q \min_{q\in Y^*} p^T Aq minqYpTAq The minimum is achieved at a pure strategy of Player II.

Theorem : There exists p ′ ∈ X ∗ p' \in X^* pX such that
V ′ = min ⁡ q ∈ Y ∗ p ′ T A q = max ⁡ p ∈ X ∗ min ⁡ q ∈ Y ∗ p T A q V'=\min_{q\in Y^*}p'^T Aq=\max_{p\in X^*} \min_{q\in Y^*}p^TAq V=qYminpTAq=pXmaxqYminpTAq
p ′ p' p is called the safety strategy of Player I and the value V ′ V' V is called the lower value of the game

The minimum is achieved by a pure strategy of Player II.

If Player I use his safety strategy then he is guaranteed to lose at lease V ′ V' V no matter what Player II does

Theorem 2 : There exists q ′ ∈ Y ∗ q' \in Y^* qY such that
V ′ ′ = max ⁡ p ∈ X ∗ p T A q ′ = min ⁡ q ∈ Y ∗ max ⁡ p ∈ X ∗ p T A q V''=\max_{p\in X^*}p^T Aq'=\min_{q\in Y^*} \max_{p\in X^*}p^TAq V=pXmaxpTAq=qYminpXmaxpTAq
q ′ q' q is called the safety strategy of Player II and the value V ′ ′ V'' V is called the upper value of the game

If Player II use his safety strategy then he is guaranteed to lose at most V ′ ′ V'' V no matter what Player I does

Theorem : V ′ ≤ V ′ ′ V'\leq V'' VV

Minimax Theorem : V = V ′ = V ′ ′ V=V'=V'' V=V=V is called the value of the game.

Finding safety strategies for 2-strategy games

Theorem : The two lines will meet within [0,1] iff No saddle point.

Solving 2 × n 2 \times n 2×n and m × 2 m \times 2 m×2 games ( m × 2 m \times 2 m×2 games ) :

  1. Check for saddle point first. If there is a saddle point then we solve the game already.
  2. Suppose there is no saddle point. Using the graphical method, we can find the safety strategy of Player II for m × 2 m\times 2 m×2 games. Since the Minimax occurs at the intersection of two lines, Player I, knowing that Player II will play his/her safety strategy, will play the two Rows giving rise to the two lines. The situation then reduces to 2 × 2 2\times 2 2×2 games.

Example : Reduction by removing dominated strategies.
A = ( 0 4 6 5 7 4 9 6 3 ) A=

(046574963)
A=059476643
We can eliminate column 2 by a mixture of Column 1 & 3.
1 3 ( 0 5 9 ) + 2 3 ( 6 4 3 ) ≤ ( 4 7 6 ) \frac{1}{3}
(059)
+\frac{2}{3}
(643)
\leq
(476)
31059+32643476

Theorem : Maximin Principle ⇔ \Leftrightarrow Equilibrium Principle.

Lemma : Let p belong to X ∗ X^* X and we let J p = { j : C o l u m n    j    i s    a    B R    t o    p } J_p=\{j: Column\; j\; is \;a\; BR\; to\; p\} Jp={j:ColumnjisaBRtop}, the set of best response columns to p. Then, q = ( q 1 , . . . q n ) T q=(q_1,...q_n)^T q=(q1,...qn)T is a BR to p p p iff q j > 0 q_j>0 qj>0 implies j belongs to J p J_p Jp.

The Lemma says that a BR to p of Player I uses only the BR columns to p. Similarly, a BR to q of Player II uses only the BR rows to p.

Theorem (Principle of Indifference) : Let p = ( p 1 , . . . p m ) T p=(p_1,...p_m )^T p=(p1,...pm)T lie in X ∗ X^* X and q = ( q 1 , . . . q n ) T q=(q_1,...q_n)^T q=(q1,...qn)T lie in Y ∗ Y^* Y such that < p , q > <p,q> <p,q> is an equilibrium pair. Let p T A q = V p^TAq=V pTAq=V. Then, a i 1 q 1 + . . . + a i n q n = V a_{i1}q_1+...+a_{in}q_n=V ai1q1+...+ainqn=V for all i for which p i > 0 p_i>0 pi>0 and a 1 j p 1 + . . . + a m j p m = V a_{1j}p_1+...+a_{mj}p_m=V a1jp1+...+amjpm=V for all j for which q j > 0 q_j>0 qj>0

The Principle of Indifference gives the following recipe to find all the best response strategies to a given strategy :

  1. Find all best response pure strategies
  2. Use only those BR pure strategy to compose your mixed strategy

Equilibrium Principle is equivalent to Maximin Principle

Theorem : Let < p 1 , q 1 > <p_1, q_1> <p1,q1>, < p 2 , q 2 > <p_2, q_2> <p2,q2> be equilibrium pairs. Then, < p 1 , q 2 > <p_1, q_2> <p1,q2> is also an equilibrium pair.

Equalizing strategy : To define an equalizing strategy we only need to know that it equalizes on the opponent’s pure strategies.

Magic Square Games:A magic square game is an matrix game such that the row sums and the column sums of the payoff matrix are all equal. Then, < 1 4 , 1 4 , 1 4 , 1 4 > <\frac{1}{4},\frac{1}{4},\frac{1}{4},\frac{1}{4}> <41,41,41,41> is an equalizing strategy for Player I and, < 1 4 , 1 4 , 1 4 , 1 4 > <\frac{1}{4},\frac{1}{4},\frac{1}{4},\frac{1}{4}> <41,41,41,41> is an equalizing strategy for Player II.

Diagonal Games : A diagonal game is a game such that its payoff matrix is a diagonal matrix.

Solution for diagonal games :

  1. d i d_i di are all positive or all negative. Then, 1 1 d 1 + . . . + 1 d n ( 1 d 1 , . . . , 1 d n ) \frac{1}{\frac{1}{d_1}+...+\frac{1}{d_n}}(\frac{1}{d_1},...,\frac{1}{d_n}) d11+...+dn11(d11,...,dn1) is an equalizing strategy for each player.
  2. Otherwise, 0 is a saddle point.

Finding safety (Optimal) strategies when the value is given : min ⁡ q p T A q = V \min_q p^TAq = V minqpTAq=V can be expressed as a system of n inequalities by noting that for a fixed p, the minimum of p T A q p^TAq pTAq is attained at a pure strategy q q q.

Theorem: Let A = ( a i j ) A=(a_{ij} ) A=(aij) be a matrix game with value V. The p p p is a safety strategy for Player I iff The following n inequalities are satisfied.
p 1 a 11 + . . . + p m a m 1 ≥ V . . . p 1 a 1 n + . . . + p m a m n ≥ V \\p_1a_{11}+...+p_ma_{m1}\geq V \\... \\p_1a_{1n}+...+p_ma_{mn}\geq V p1a11+...+pmam1V...p1a1n+...+pmamnV

Symmetric Games : A game is symmetric if the rules do not distinguish between players. Both players have the same options, and the payoff when Player I uses i and Player II uses j is the negative of the payoff when Player I uses j and Player II uses i. Thus the payoff matrix can be arranged to be skew-symmetric, i.e. A = − A T A= -A^T A=AT .

Theorem : A finite symmetric game has value zero. Any strategy optimal (safety) for one player is also optimal for the other.

Theorem : Let the value of the game be V. Let P be the set of optimal strategies of Player I. Let Q be the set of optimal strategies of Player II. P, Q are compact convex polyhedra.

Let C be a convex set. A point x in C is called an extreme point if and only if one cannot find y, z in C distinct from x such that x=(y + z)/2. Note that the extreme points of a compact convex polyhedron are its vertices.

Essential submatrix : An essential submatrix of a matrix game is characterized by the following.

  1. It is a square matrix;
  2. Considered as a game in itself. It has a unique equalizing strategy for each player;
  3. The equalizing strategy from (ii), extended by 0’s as necessary, are optimal in the original game.

The Primal Problem of a linear programming problem is in the following form.

Choose x 1 x_1 x1,…, x m x_m xm To minimize c 1 x 1 + . . . + c m x m c_1x_1 +...+ c_mx_m c1x1+...+cmxm

Subject to the constraints :
a 11 x 1 + . . . + a m 1 x m ≥ b 1 . . . a 1 n x 1 + . . . + a m n x m ≥ b n \\a_{11}x_1 + ... + a_{m1}x_m \geq b_1 \\... \\a_{1n}x_1 + ... + a_{mn}x_m \geq b_n a11x1+...+am1xmb1...a1nx1+...+amnxmbn

and x i ≥ 0 x_i\geq 0 xi0 for i = 1 , . . . , m i=1,...,m i=1,...,m

The Dual Problem to the primal problem is to choose y 1 , . . . , y n y_1,...,y_n y1,...,yn To maximize b 1 y 1 + . . . + b n y n b_1y_1 +...+ b_ny_n b1y1+...+bnyn

Subject to the constraints :
a 11 y 1 + . . . + a 1 n y n ≤ c 1 . . . a m 1 y 1 + . . . + a m n y n ≤ c m \\a_{11}y_1 + ... +a_{1n}y_n \leq c_1 \\... \\a_{m1}y_1 + ... +a_{mn}y_n \leq c_m a11y1+...+a1nync1...am1y1+...+amnyncm
and y j ≥ 0 y_j\geq 0 yj0 for j = 1 , . . . , n j=1,...,n j=1,...,n

Theorem : (Duality Theorem of LP) The optimal value of the Primal Problem and the optimal value of the Dual Problem are equal

The Duality Theorem implies the Minimax Theorem in game theory.

Solution to the Primal Problem in 2-Person Zero-Sum Game :

Choose p 1 p_1 p1,…, p m p_m pm To maximize v v v

Subject to the constraints :
a 11 p 1 + . . . + a m 1 p m ≥ v . . . a 1 n p 1 + . . . + a m n p m ≥ v p 1 + . . . p m = 1 \\a_{11}p_1 + ... + a_{m1}p_m \geq v \\... \\a_{1n}p_1 + ... + a_{mn}p_m \geq v \\p_1+...p_m=1 a11p1+...+am1pmv...a1np1+...+amnpmvp1+...pm=1
and p i ≥ 0 p_i\geq 0 pi0 for i = 1 , . . . , m i=1,...,m i=1,...,m

Let x i = p i v x_i=\frac{p_i}{v} xi=vpi,then

Choose x 1 x_1 x1 ,…, x m x_m xm To minimize x 1 + . . . x m x_1+...x_m x1+...xm

Subject to the constraints :
a 11 x 1 + . . . + a m 1 x m ≥ 1 . . . a 1 n x 1 + . . . + a m n x m ≥ 1 \\a_{11}x_1 + ... + a_{m1}x_m \geq 1 \\... \\a_{1n}x_1 + ... + a_{mn}x_m \geq 1 a11x1+...+am1xm1...a1nx1+...+amnxm1
and x i ≥ 0 x_i\geq 0 xi0 for i = 1 , . . . , m i=1,...,m i=1,...,m

Games in Extensive Form

In the extensive form, games are sequential, interactive processes which moves from one position to another in response to the wills of the players or the wills of chance.

Games in Extensive Form : Three new concepts make their appearance in the extensive form of a game: the game tree, chance moves, and information sets

The Game Tree : The extensive form of a game is modeled using a directed Graph that we have seen in combinatorial games. The vertices represent positions of the game. The followers, F(x), of a position, x, are those positions that can be reached from x in one move.

A path from a vertex t 0 t_0 t0 to a vertex t 1 t_1 t1 then represents the whole history of play in the game moving from position t 0 t_0 t0 to t 1 t_1 t1

The Kuhn tree of a game : When a tree is used to define a game, the vertices are interpreted as positions and the edges as moves. Each nonterminal position is assigned either to the player responsible to choosing the next move or to chance.

Chance Moves : Many games involve chance moves. It is assumed that the players are aware of the probabilities of the various outcomes resulting from a chance move. To incorporate chance moves in the Kuhn tree, we introduce a new player called Chance. The moves from Chance carry different probabilities. The probabilities assigned to the edges leading out of each chance vertex must be displayed.

Play starts at the initial vertex and continues along one of the paths eventually ending in one of the terminal vertices. Each directed path from the root to terminal vertex describes a possible course of play.

Information : Another important aspect we must consider in studying the extensive form of games is the amount of information available to the players about past moves of the game. When a player choosing a move is uninformed about some of the previous moves (concealed moves, simultaneous moves etc.), we draw little “balloons” around set of vertices which the player making the choice cannot discriminate. These are called information sets, and the player in question must make the same choice for all vertices in the set.(The player has the same information on the vertexes in the same balloon)

Edges issuing from vertices in the information set must be labeled the same. For example, {paper, scissor, rock}

In summary, the following are the basic elements of a Kuhn tree :

  1. A finite set N, representing the players.
  2. A finite rooted tree representing positions and moves. The root represents the start of the game and the terminal vertices the possible conclusions.
  3. Terminal vertices represent the possible conclusions of the game. The payoff to each player is labeled at this vertex as a vector.
  4. Vertices are labeled to name of the player and Chance. A move by them at these positions means choosing a particular edge leading out from this vertex.
  5. Edges coming out from Chance should be labeled with the probability of this particular occurrence.
  6. Information sets: The vertices of each player are partitioned into information sets. The player labelled cannot distinguished the vertices within this information set. Hence, there are same number of edges with the same label leading out from each vertex in the information set.

Games of Perfect Information : A game of perfect information is a game in extensive form in which each
information set of every player contains a single vertex.

Games of perfect recall : Games in which players remember all past information they once knew and all past moves they made are called games of perfect recall.

Games of complete information : Games in which both players know the rules of the game, that is, in which both players know the Kuhn tree, are called games of complete information.

Reduction of a Game in Extensive Form to Strategic Form

Pure strategy : A pure strategy is a player’s complete plan for playing the game. It should cover every
contingency. A pure strategy for a Player is a rule that tells him exactly what move to make in each of his information sets. It should specify a particular edge leading out from each information set

If the set of edges coming out from the information sets are { A , B } , { E , F } , { C , D } \{A,B\}, \{E,F\}, \{C,D\} {A,B},{E,F},{C,D} , Set of Pure Strategies : { A , B } × { C , D } × { E . F } = { A C E , A C F , A D E , A D F , B C E , B C F , B D E , B D F } \{A,B\}\times\{C,D\}\times\{E.F\}=\{ACE, ACF, ADE, ADF, BCE, BCF, BDE, BDF\} {A,B}×{C,D}×{E.F}={ACE,ACF,ADE,ADF,BCE,BCF,BDE,BDF}

Reduced pure strategy : Specification of choices at all information sets except those that are eliminated by the previous moves.

Random payoffs : The actual outcome of the game for given pure strategies of the players depends on the chance moves selected, and is therefore a random quantity. We represent random payoffs by their expected values.

Game in strategic form (normal form) :

  • Set of players: { 1 , … , n } \{1,…,n\} {1,,n}
  • A set of pure strategies for player i i i, X i      i = 1 , … , n X_i\;\;i=1,…,n Xii=1,,n
  • Payoff function for the ith player. u i : X 1 × … × X n → R u_i : X_1\times…\times X_n \rightarrow R ui:X1××XnR

Set of reduced strategies for I : { A E , A F , B C , B D } \{AE,AF, BC,BD\} {AE,AF,BC,BD}

Set of reduced strategies for II : { a c , a d , b c , b d } \{ac,ad, bc,bd\} {ac,ad,bc,bd}

Strategic Form :

acadbcbd
AE(10,0)
AF(0,20)
BC(-2.3,2.2)
BD(-0.5,-2.3)

Strategic Equilibrium : A vector of pure strategy choices ( x 1 , x 2 , . . . , x n ) (x_1 , x_2 , . . . , x_n) (x1,x2,...,xn) with x i ∈ X i x_i \in X_i xiXi for i = 1 , . . . , n i = 1, . . . , n i=1,...,n is said to be a pure strategic equilibrium, or PSE for short, if for all i = 1 , 2 , . . . , n i = 1, 2, . . . , n i=1,2,...,n, and for all x ∈ X i x \in X_i xXi
u i ( x 1 , . . . , x i − 1 , x i , x i + 1 , . . . , x n ) ≥ u i ( x 1 , . . . , x i − 1 , x , x i + 1 , . . . , x n ) u_i(x_1, . . . , x_{i-1}, x_i, x_{i+1}, . . . , x_n) \geq u_i (x_1, . . . , x_{i-1}, x,x_{i+1}, . . . , x_{n}) ui(x1,...,xi1,xi,xi+1,...,xn)ui(x1,...,xi1,x,xi+1,...,xn)
Equation says that if the players other than player i i i use their indicated strategies, then the best player i i i can do is to use x i x_i xi. Such a pure strategy choice of player i i i is called a best response to the strategy choices of the other players.

Labeling Algorithm for Finding All PSE’s(2 Person)

  • Put an asterisk after each of Player I’s payoffs that is a maximum of its column.
  • Put an asterisk after each of Player II’s payoffs that is a maximum of its row.
  • Then any entry of the matrix at which both I’s and II’s payoffs have asterisks is a PSE, and conversely.

Bad News : Many games have no PSE’s

Method of Backward Induction : Starting from any terminal vertex and trace back to the vertex leading to it. The player at this vertex will discard those edges with lower payoff. Then, treat this vertex as a terminal vertex and repeat the process. Then, we get a path from the root to a terminal vertex.

Theorem : The path obtained by the method of backward induction defines a PSE.

If a strategic Form comes from an extensive form with perfect information then there exists at least one PSE by the method of Backward Induction.

Theorem : The PSE obtained by the method of backward induction satisfies stronger properties so that it is called a perfect pure strategy equilibrium or PPSE.

Subgame : A subgame of a game presented in extensive form is obtained by taking a vertex in the Kuhn tree and all the edges and paths originated from this vertex.

Theorem : A PSE of a game in extensive form is called a Perfect Pure Strategy Equilibrium (PPSE) if it is a PSE for all subgames.

Subgame perfect implies that when making choices, a player look forward and assumes that the choice that will subsequently be made by himself and by others will be rational. Threats which would be irrational to carry through are ruled out. It is precisely this kind of forward looking rationality that is most suited to economic applications.

PSE allow players to make noncredible threats provided they never have to carry them out. Decisions on the equilibrium path are driven in part by what the players expect will happen off the equilibrium path.

Behavior Strategies : A behavior strategy is obtained if the Player makes a randomized choice independently at each of his/her information sets.

Example : Player I has 3 information sets. The set of edges coming out from the information sets are P = { A , B } , Q = { E , F } , R = { C , D } P=\{A,B\}, Q=\{E,F\}, R=\{C,D\} P={A,B},Q={E,F},R={C,D}

A behavior strategy ( p , q , r ) (p, q, r) (p,q,r) means Probability p q r pqr pqr for ACE, p ( 1 − q ) r p(1-q)r p(1q)r for ADE, p q ( 1 − r ) pq(1-r) pq(1r) for ACF, p ( 1 − q ) ( 1 − r ) p(1-q)(1-r) p(1q)(1r) for ADF. This is a mixed strategy for Player I.

Each behavior strategy induces a probability distribution on the set of terminal vertices.

Each mixed strategy induces a probability distribution on the set of terminal vertices.

Dimension of behavior strategies : Suppose the Player has m information sets such that there are k i k_i ki choices at the ith information set. Then, the dimension of behavior strategies for this player is ( k 1 − 1 ) + ( k 2 – 1 ) + … + ( k m – 1 ) (k_1-1) + (k_2– 1) +…+ (k_m – 1) (k11)+(k21)++(km1). Note that the space of mixed strategies is ( k 1 k 2 … k m − 1 ) ( k_1 k_2…k_m -1) (k1k2km1)

Question: Can each mixed strategy represented by an “equivalent” behavior strategy?

From the consideration of dimension, the answer should be negative without additional assumptions.

Theorem (Harold Kuhn) : For a game of perfect recall, mixed strategies are equivalent to behavior strategies.

Two-Person General-Sum Games

General-Sum Strategic Form Games : The normal or strategic form of a two person game is given by two sets X X X and Y Y Y of pure strategies of the players, and two real-valued functions u 1 ( x , y ) u_1(x, y) u1(x,y) and u 2 ( x , y ) u_2(x, y) u2(x,y) defined on X × Y X \times Y X×Y, representing the payoffs to the two players. If Player I chooses x ∈ X x \in X xX and Player II chooses y ∈ Y y \in Y yY, then Player I receives u 1 ( x , y ) u_1(x, y) u1(x,y) and Player II receives u 2 ( x , y ) u_2(x, y) u2(x,y)

The Maximin Principle does not apply to bimatrix games!

The theory is generally divided into two branches, the noncooperative theory and the cooperative theory.

  1. In the noncooperative theory, either the players are unable to communicate before decisions are made, or if such communication is allowed, the players are forbidden or are otherwise unable to make a binding agreement on a joint choice of strategy. The main noncooperative solution concept is the strategic equilibrium (SE)
  2. In the cooperative theory, it is assumed that the players are allowed to communicate before the decisions are made. They may make threats and counter-threats, proposals and counterproposals, and hopefully come to some compromise. They may jointly agree to use certain strategies, and it is assumed that such an agreement can be made binding. The cooperative theory itself breaks down into two branches, depending on whether or not the players have comparable units of utility and are allowed to make monetary side payments in units of utility as an incentive to induce certain strategy choices. The corresponding solution concept is called:
    1. TU cooperative value if side payments are allowed.
    2. NTU cooperative value if side payments are forbidden or otherwise unattainable

Safety Levels : In a bimatrix game with m×n matrices A and B, Player I can guarantee winning on the average at least v 1 = max ⁡ p min ⁡ j ( p 1 a 1 j + … + p m a m j ) = V a l ( A ) v_1 =\max_p \min_j (p_1 a_{1j}+…+p_m a_{mj})= Val(A) v1=maxpminj(p1a1j++pmamj)=Val(A). This is called the safety level of Player I. A strategy, p, that achieves the maximum is caled a maxmin strategy for Player I.

Noncooperative Games Strategic Equilibrium : A vector of mixed strategy choices (also called a strategy profile) ( p 1 , p 2 , . . . , p n ) p_1, p_2, . . . , p_n) p1,p2,...,pn) with p i ∈ X i ∗ p_i \in X^*_i piXi for i = 1 , . . . , n i =1, . . . , n i=1,...,n is said to be a strategic equilibrium, or SE for short, if for all i = 1 , 2 , . . . , n i = 1, 2, . . . , n i=1,2,...,n, and for all p ∈ X i ∗ p \in X^*_i pXi, u i ( p 1 , . . . , p i − 1 , p i , p i + 1 , . . . , p n ) ≥ u i ( p 1 , . . . , p i − 1 , p , p i + 1 , . . . , p n ) u_i(p_1 , . . . , p_{i−1}, p_i , p_{i+1}, . . . , p_n ) \geq u_i(p_1 , . . . , p_{i−1}, p, p_{i+1}, . . . , p_n ) ui(p1,...,pi1,pi,pi+1,...,pn)ui(p1,...,pi1,p,pi+1,...,pn) (*). Any mixed strategy p i p_i pi that satisfies (*) for all p ∈ X i ∗ p ∈ X^*_i pXi is a best response of player I to the mixed strategies of the other players.

Nash equilibria : Every finite n-person game in strategic form has at least one strategic equilibrium.

Theorem : < p , q > <p, q> <p,q> is a SE whenever the following two conditions are valid.

  1. p i > 0 p_i>0 pi>0 implies Row is a BR to q q q.
  2. q j > 0 q_j>0 qj>0 implies Column j is a BR to p p p.

Labeling algorithm to check for SE’s : Let [ A , B ] [A,B] [A,B] be a bimatrix game. Given a mixed strategy p = ( p 1 , … p m ) p=(p_1 ,…p_m) p=(p1,pm) for Player I, and q = ( q 1 , … , q n ) q=(q_1 ,…,q_n) q=(q1,,qn) a mixed strategy for Player II, there is a simple labeling algorithm to determine whether < p , q > <p, q> <p,q> is a SE.

  1. Put the label BR to the best response rows to q q q
  2. Put the label + + + at the ith row where p i p_i pi is positive
  3. Put the label BR to the best response columns to p p p
  4. Put the label + + + at the jth column where q j q_j qj is positive
  5. < p , q > <p, q> <p,q> is a SE whenever a + + + label is accompanied by a BR label

Theorem : <p, q> is a SE if p , q , α , β p,q,\alpha,\beta p,q,α,β satisfy the following optimization problem.

max ⁡ ( p T ( A + B ) q − α − β ) \max (p^T(A+B)q - \alpha - \beta) max(pT(A+B)qαβ) , Π n = ( 1 , 1 , 1 , . . . , 1 ) T \Pi_n=(1,1,1,...,1)^T Πn=(1,1,1,...,1)T , Subject o the following constraints
A q − α Π m ≤ 0 p T Π m − 1 = 0 p T B − β Π n T ≤ 0 q T Π n − 1 = 0 0 ≤ q , 0 ≤ p \\Aq - \alpha \Pi_m \leq 0 \\p^T \Pi_m-1=0 \\p^T B - \beta\Pi_n^T\leq 0 \\q^T \Pi_n -1=0 \\0\leq q, 0\leq p AqαΠm0pTΠm1=0pTBβΠnT0qTΠn1=00q,0p

Remark: The optimal value is 0.

Tetraskelion Method to find all SE’s for 2 × 2 2\times2 2×2 games :

  1. For each mixed strategy ( p , 1 − p ) (p, 1-p) (p,1p) of Player I, we use the method as in 0-zero sum games to find the best response columns.
  2. Plot this in the unit square as a graph parameterized by p p p, 0 ≤ p ≤ 1 0\leq p \leq 1 0p1.
  3. For each mixed strategy ( q , 1 − q ) (q, 1-q) (q,1q) of Player II, we use the method as in 0-zero sum games to find the best response rows.
  4. Plot this in the unit square in the above as a graph parameterized by q q q, 0 ≤ q ≤ 1 0\leq q \leq 1 0q1.
  5. The SE’s are the intersections of the two graphs

Duopoly : There are two firms sending out fishing boats every day to catch fish to sell in the market the same day. The day’s catch of fish is brought into port by 9:00 am and must be sold the same day. Each day is taken to be entirely separate in the sense that the two firms concern themselves with the profit of only the current day and engage in no longer-term planning at all. When they set sail, each has already decided on the catch to bring back that day, and neither knows what the other has decided. Perhaps each has an educated guess, but neither has spies in the other’s camp or any source of factual knowledge of the other’s commitments. Each of them, however, does know both cost functions, as well as the market demand function which is a single market price because the catches of the two firms are identical. The decision variables of the firms are their output levels.

Cournot 1838 : Two firms produce the same product. Cost function for Firm 1: C 1 ( q 1 ) = q 1 C_1 (q_1)=q_1 C1(q1)=q1. Cost function for Firm 2: C 2 ( q 2 ) = q 2 C_2 (q_2)=q_2 C2(q2)=q2. q 1 , q 2 q_1, q_2 q1,q2 are the quantities produced. Price function: P ( q ) = 25 − q P(q)=25-q P(q)=25q, where q q q is the total amount in the market

Duopoly: Firm 1 and Firm 2 will use quantity as strategy. Profit to Firm 1: P 1 ( q 1 ) = ( 25 − q ) q 1 − q 1 P_1(q_1)=(25-q)q_1-q_1 P1(q1)=(25q)q1q1, Profit to Firm 2: P 2 ( q 2 ) = ( 25 − q ) q 2 − q 2 P_2(q_2)=(25-q)q_2-q_2 P2(q2)=(25q)q2q2, where q = q 1 + q 2 q=q_1+q_2 q=q1+q2. Firm 1 will find q 1 q_1 q1 to maximize profit. Firm 2 will find q 2 q_2 q2 to maximize profit.

Stackelberg Leader and Follower : Suppose Firm 1 moves first. Firm 1 announces the quantity q 1 q_1 q1 (strategy) that it will produce. Knowing this, Firm 2 will choose its quantity q 2 q_2 q2 to maximize profit (as BR). Then, q 2 = 1 2 ( 24 − q 1 ) q_2= \frac{1}{2} (24-q_1) q2=21(24q1) , Firm 1 knows this, its profit is ( 25 − q ) q 1 − q 1 (25-q)q_1-q_1 (25q)q1q1 , where q = q 1 + 1 2 ( 24 − q 1 ) q=q_1 + \frac{1}{2} (24-q_1) q=q1+21(24q1)

Correlated Equilibrium

Crossing Game :

CrossYield
Cross(-10,-10)(5,0)
Yield(0,5)(-1,-1)

A traffic light might be installed. The traffic light is merely a randomization device which, with a certain probability, tells each of the two drivers whether to Cross or to Yield. This is not much different from a pair of mixed strategies. Except that now the two players’ probability distribution on strategies can be correlated. For example, the traffic light can be

CrossYield
Cross00.55
Yield0.400.05

Will the players comply or will they deviate from the traffic light’s instruction? If the traffic light is operated in a way that
both players see that it is in their best interest to comply then we have an EQUILIBRIUM! We will call this a Correlated
Equilibrium
.

Suppose Player I knows that Player II will comply with the traffic light instructions. When Player I sees the signal to Cross, he will not deviate because he knows that (Cross, Cross) occurs with p r o b = 0 prob=0 prob=0. When Player I sees the signal to Yield, he can compute the conditional probability of instructions to Player II given that his instruction is Yield. Then,
P r o b ( I I    t o    C r o s s ∣ I    t o    Y i e l d ) = P r o b ( I I    t o    C r o s s , I    t o    Y i e l d ) P r o b ( I    t o    Y i e l d ) = 0.4 0.45 P r o b ( I I    t o    Y i e l d ∣ I    t o    Y i e l d ) = P r o b ( I I    t o    Y i e l d , I    t o    Y i e l d ) P r o b ( I    t o    Y i e l d ) = 0.05 0.45 \\Prob(II \;to \;Cross| I\; to\; Yield)= \frac{Prob(II \;to\; Cross, I\; to\; Yield)}{Prob(I\; to\; Yield)}=\frac{0.4}{0.45} \\Prob(II \;to \;Yield| I\; to\; Yield)= \frac{Prob(II \;to\; Yield, I\; to\; Yield)}{Prob(I\; to\; Yield)}=\frac{0.05}{0.45} Prob(IItoCrossItoYield)=Prob(ItoYield)Prob(IItoCross,ItoYield)=0.450.4Prob(IItoYieldItoYield)=Prob(ItoYield)Prob(IItoYield,ItoYield)=0.450.05
Then, the expected payoff to Player I if he complies with the instruction to Yield is 0 8 9 + ( − 1 ) 1 9 = − 1 9 0\frac{8}{9} + (-1)\frac{1}{9}= -\frac{1}{9} 098+(1)91=91. The expected payoff to Player I if he deviates from the instruction to Yield is − 10 8 9 + 5 1 9 = − 75 9 -10\frac{8}{9} + 5\frac{1}{9}=-\frac{75}{9} 1098+591=975. This is far inferior to his payoff if he complies with the instruction. Same does Player II.

Therefore, it is for the best interest of both players to comply with the instruction of the traffic light.This probability distribution over the outcomes is then called a Correlated Equilibrium.

Correlated Equilibrium : In general, let [ A , B ] [A, B] [A,B] be an m × n m\times n m×n bimatrix game where A = ( a i j ) , B = ( b i j ) A=(a_{ij}),B=(b_{ij}) A=(aij),B=(bij). A probability distribution of the outcomes of this game is given by p i j , p i j ≥ 0 p_{ij}, p_{ij}\geq 0 pij,pij0, ∑ i , j p i j = 1 \sum_{i,j}p_{ij}=1 i,jpij=1. This means that the instruction of playing
(Row i, Column j) to the players with probability p i j p_{ij} pij. This probability distribution is called a Correlated Equilibrium (CE) if it is for the best interest for players to comply with the instruction.

P r o b ( C o l u m n    j    f o r    I I ∣ R o w    i    f o r    I ) = P r o b ( C o l u m n    j    f o r    I I , R o w    i    f o r    I ) P r o b ( R o w    i    f o r    I ) = p i j ∑ k p i k Prob(Column\; j\; for\; II| Row\; i\; for\; I)= \frac{Prob(Column \;j\; for\; II, Row\; i\; for\; I)}{Prob(Row\; i\; for\; I)} =\frac{p_{ij}}{\sum_kp_{ik}} Prob(ColumnjforIIRowiforI)=Prob(RowiforI)Prob(ColumnjforII,RowiforI)=kpikpij. For Player I to comply with the instruction to play Row i, the payoff must be better than deviating the instruction in playing Row r instead. This means
∑ j p i j ∑ k p i k a i j ≥ ∑ j p i j ∑ k p i k a r j      ∀ r ∈ { 1 , 2 , 3... m } \sum_j \frac{p_{ij}} {\sum_k p_{ik}}a_{ij} \geq \sum_ j \frac{p_{ij}} {\sum_kp_{ik}}a_{rj}\;\;\forall r \in\{1,2,3...m\} jkpikpijaijjkpikpijarjr{1,2,3...m}
Equivalently and similarly,
∑ j p i j ( a i j – a r j ) ≥ 0      ∀ i , r = 1 , . . . m ∑ i p i j ( b i j – b i s ) ≥ 0      ∀ j , s = 1 , . . . n . \\\sum_j p_{ij} (a_{ij} – a_{rj})\geq 0 \;\;\forall i, r=1,...m \\\sum_i p_{ij} (b_{ij} – b_{is})\geq 0 \;\;\forall j, s=1,...n. jpij(aijarj)0i,r=1,...mipij(bijbis)0j,s=1,...n.
For an m × n m\times n m×n game, there are m ( m − 1 ) + n ( n − 1 ) m(m-1) +n(n-1) m(m1)+n(n1) inequalities to characterize an CE.

Proposition: Any SE is a CE (correlated equilibrium).

Proposition: The convex combination of two CE’s is a CE.

The corresponding payoff of a CE may be outside the convex hull of the corresponding payoff points of SEs.

Proposition: Let [A, -A] be a 2-person 0-sum game with value V. For any CE of [A, -A], the payoff vector is (V, -V).

Computation of CE : Find ( p i j ) (p_{ij}) (pij) such that
∑ j p i j ( a i j – a r j ) ≥ 0      ∀ i , r = 1 , . . . m ∑ i p i j ( b i j – b i s ) ≥ 0      ∀ j , s = 1 , . . . n p i j ≥ 0 , ∑ i , j p i j = 1 \\\sum_j p_{ij} (a_{ij} – a_{rj})\geq 0\;\;\forall i, r=1,...m \\\sum_i p_{ij} (b_{ij} – b_{is})\geq 0\;\;\forall j, s=1,...n \\ p_{ij}\geq 0, \sum_{i,j} p_{ij} =1 jpij(aijarj)0i,r=1,...mipij(bijbis)0j,s=1,...npij0,i,jpij=1
We can represent the above inequalities as finding an optimal strategy, ( p i j ) (p_{ij}) (pij), of the row chooser of a certain 2-person 0-sum game whose value is nonnegative. The payoff matrix A of this game is an m n × ( m ( m − 1 ) + n ( n − 1 ) ) mn \times (m(m-1) + n(n-1)) mn×(m(m1)+n(n1)) matrix.

For 2 × 2 2\times 2 2×2 Game,

p 11 p_{11} p11 a 11 − a 21 a_{11}-a_{21} a11a210 b 11 − b 12 b_{11}-b_{12} b11b120
p 12 p_{12} p12 a 12 − a 22 a_{12}-a_{22} a12a2200 b 12 − b 11 b_{12}-b_{11} b12b11
p 21 p_{21} p210 a 21 − a 11 a_{21}-a_{11} a21a11 b 21 − b 22 b_{21}-b_{22} b21b220
p 22 p_{22} p220 a 22 − a 12 a_{22}-a_{12} a22a120 b 22 − b 21 b_{22}-b_{21} b22b21
  1. The rows of A is labeled by i j , i = 1 , . . . m , j = 1 , . . . n ij, i=1,...m, j=1,...n ij,i=1,...m,j=1,...n.
  2. The columns of A is labeled by ( i , r ) , i ≠ r , i , r = 1 , . . . , m (i, r), i≠r, i, r=1,...,m (i,r),i=r,i,r=1,...,m and by ( j , s ) j ≠ s , j , s = 1 , . . . , n (j, s) j≠s, j, s=1,...,n (j,s)j=s,j,s=1,...,n.
  3. The entry at row ij and column (i, r) is ( a i j – a r j ) (a_{ij} – a_{rj}) (aijarj)
  4. The entry at row ij and column (j, s) is ( b i j – b i s ) (b_{ij} – b_{is}) (bijbis)
  5. All the other entries are 0.

Application of Game Theory to Biology

Example : Hawk & Dove

  • Hawk: keep fighting until you or your opponent is injured
  • Dove: run away
  • Winner gets a prize of V and injury means your fitness goes down by D.
HawkDove
Hawk ( 1 2 ( V − D ) , 1 2 ( V − D ) ) (\frac{1}{2}(V-D),\frac{1}{2}(V-D)) (21(VD),21(VD)) ( V , 0 ) (V,0) (V,0)
Dove ( 0 , V ) (0,V) (0,V) ( 1 2 V , 1 2 V ) (\frac{1}{2}V,\frac{1}{2}V) (21V,21V)

We are then studying the symmetric bimatrix game [ A , A T ] [A, A^T] [A,AT]. From Game Theory, we will arrive at the concept of
finding a strategy p such that < p , p > <p,p> <p,p> is a SE i.e. p is a BR to p.

Evolution Stable Strategy( p E S S p^{ESS} pESS) : A mutation occurs, producing a small numbers of players with mutant strategy p M p^M pM, p M ≠ p E S S p^M \not = p^{ESS} pM=pESS. The proportion of the population with this mutant strategy is ε \varepsilon ε. Then for a player using p E S S p^{ESS} pESS the expected payoff is
π E S S = ( 1 − ε ) π ( p E S S , p E S S ) + ε π ( p E S S , p M ) \pi^{ESS}=(1-\varepsilon)\pi(p^{ESS},p^{ESS})+\varepsilon \pi(p^{ESS},p^M) πESS=(1ε)π(pESS,pESS)+επ(pESS,pM)
A player using p M p^M pM the payoff is
π M = ( 1 − ε ) π ( p M , p E S S ) + ε π ( p M , p M ) \pi^M =(1-\varepsilon)\pi(p^M,p^{ESS})+\varepsilon \pi(p^M,p^M) πM=(1ε)π(pM,pESS)+επ(pM,pM)
We want π M < π E S S \pi^M <\pi^{ESS} πM<πESS , This is guaranteed if for p M ≠ p E S S p^M \not = p^{ESS} pM=pESS

  • π ( p E S S , p E S S ) ≥ π ( p M , p E S S ) \pi(p^{ESS},p^{ESS})\geq \pi(p^M,p^{ESS}) π(pESS,pESS)π(pM,pESS)
  • π ( p M , p M ) < π ( p E S S , p M ) \pi(p^M,p^M)<\pi(p^{ESS},p^M) π(pM,pM)<π(pESS,pM)

Theorem: Suppose the payoff matrix is ( A , A T ) (A, A^T) (A,AT). Suppose for the jth column of A A A is such that its diagonal element, a j j a_{jj} ajj, is the largest (strictly) of the column. Then playing the jth strategy is an ESS.

Necessary and sufficient condition for an ESS : Let [ A , A T ] [A, A^T] [A,AT] be a given bimatrix game, where A A A is an n × n n\times n n×n matrix. Let p = ( p 1 , … , p n ) p=(p_1,…,p_n) p=(p1,,pn) be a mixed strategy. Then p p p is an ESS iff

  1. p j > 0 p_j>0 pj>0 implies jth strategy is a BR row to p p p
  2. Let B B B be the matrix ( b i j ) (b_{ij}) (bij) such that b i j = a i j b_{ij}=a_{ij} bij=aij if i, j are BR rows to p p p and b i j = 0 b_{ij}=0 bij=0 otherwise. Then, Z T B Z < 0 Z^T B Z <0 ZTBZ<0 where Z T = ( z 1 , … z n ) ≠ 0 Z^T=(z_1,…z_n)\not = 0 ZT=(z1,zn)=0 satisfies
    1. z i = 0 z^i= 0 zi=0 if i is not a BR row to p p p and
    2. z 1 + … + z n = 0 z_1+…+ z_n= 0 z1++zn=0.

Let A A A be the Paper-Scissor-Rock payoff matrix. Then [ A , A T ] [A, A^T] [A,AT] does not have an ESS.

An evolutionary process has two basic elements: mutation and selection mechanism. The ESS concept highlights mutation. Replicator dynamics highlights the role of selection.

Replicator dynamics is formalized as a system of ODE that does not include any mutation mechanism. Instead robustness against mutation is indirectly taken care of by dynamic stability criteria

Replicator Dynamics : Let p = ( p 1 , … p n ) T p=(p_1 ,…p_n)^T p=(p1,pn)T be a mixed strategy for a symmetric bimatrix game [ A , A T ] [A, A^T] [A,AT]. p i p_i pi denotes the proportion of population using the ith strategy. Payoff to strategy i against p p p is ∑ j a i j p j \sum_j a_{ij}p_j jaijpj. Payoff to strategy p p p against p p p is ∑ i ∑ j a i j p i p j \sum_i \sum_j a_{ij}p_ip_j ijaijpipj

We assume the growth rate of the population using strategy i is proportional to the advantage of that strategy over the whole population. For simplicity, we assume the constant is 1. Therefore, growth rate of the population playing strategy i is
∑ j a i j p j − ∑ k , j p k a k j p j \sum_j a_{ij}p_j -\sum_{k,j} p_k a_{kj}p_j jaijpjk,jpkakjpj
Replicator Dynamics Equations : d p i d t = [ ∑ j a i j p j − ∑ k , j p k a k j p j ] p i \frac{d p_i}{dt} =[\sum_j a_{ij}p_j -\sum_{k,j} p_k a_{kj}p_j]p_i dtdpi=[jaijpjk,jpkakjpj]pi , i = 1 , … n i=1,…n i=1,n.

An equilibrium point of the set of Replicator Dynamics Equations is for a solution such that RHS= 0.

RHS=0 iff ∑ j a i j p j = ∑ k , j p k a k j p j      ∀ p i ≠ 0 \sum_j a_{ij}p_j =\sum_{k,j} p_k a_{kj}p_j \;\;\forall p_i\not=0 jaijpj=k,jpkakjpjpi=0. iff p p p is a BR to p p p.

Therefore, equilibrium point of Replicator Dynamics Equations iff is a SE.

Repeat Games

Repeat Game : Introduce a discount factor β. x at the nth game is worth βn-1x.

Finite Automata :

  1. Input: Some “history” of the game. Player I’s action at he nth stage.
  2. Output: Player I’s action at the (n+1)th stage.
  3. A finite automata can only remember a finite number of things.

1-memory automata : The player can only remember what happened in the previous move and make decision based on that. f : { C C , C D , D C , D D } → { C , D } f: \{CC, CD, DC, DD\}\rightarrow\{C, D\} f:{CC,CD,DC,DD}{C,D} When both players are using a1-memory automata, we can use the technique of transition matrix in the study. Each row of the matrix is a probability vector giving the probability of transiting to one of the 4 possible states (CC, CD, DC, DD).

CCCDDCDD
CC p 11 p_{11} p11 p 12 p_{12} p12
CD p 21 p_{21} p21
DC
DD

Let v T = ( p 1 , p 2 , p 3 , p 4 ) v^T = (p_1,p_2, p_3,p_4) vT=(p1,p2,p3,p4) be a probability vector of the outcomes from the previous game. Then, the result for the next game will be ( v T P ) T = P T v (v^TP)^T =P^Tv (vTP)T=PTv.

PR: Permanent Retaliation

Cooperate, until, if ever, the opponent defects, then defect for ever.

Tit-for-Tat : Cooperate first, then do what your opponent’s previous move.

<PR, PR> is a SE if β is large enough.

<TFT, TFT> is a SE when β is large enough (β>1/6 in this case).

A strategy is said to be Nice: Start cooperating and never the first to defect

Let S be a nice strategy and that <S, S> is a SE. Then β is sufficiently large.

Let S be a nice strategy and that <S, S> is a SE. Then S should be provoked (when the opponent plays D, S must retaliate by playing D at some later move) by the defection of the opponent.

Cooperative Games

In the cooperative theory, we allow communication of the players and also allow binding agreements to be made. This requires some outside mechanism to enforce the agreements.

The cooperative theory is divided into two classes of problems depending on whether or not her is a mechanism for transfer of utility from one player to the other. TU/NTU

Feasible Sets of Payoff Vectors. For any game its important to know the set of feasible payoffs vectors for the players. For a noncooperative game, this set is called the noncooperative feasible set.

When players cooperate in a bimatrix game with matrices [A,B], they may agree to achieve a payoff vector of any of the m*n points, ( a i j , b i j a_{ij}, b_{ij} aij,bij) for i= 1, . ., m and j= 1. . ,n. They may also agree to any probability mixture of these points.

The NTU feasible set is the convex hull of the m*n points, ( a i j , b i j a_{ij}, b_{ij} aij,bij) for i=1, . ., m and j =1. . ,n.

The distinguishing feature of the TU case is that he players may make side payments of utility as part of the agreement.

Thus the whole line of slope −1 through the point (aij, bij) is part of the TU feasible set.

A feasible payoff vector, ( v 1 , v 2 v_1, v_2 v1,v2), is said to be Pareto optimal if or any feasible payoff vector ( v 1 ’ , v 2 ’ v_1’ , v_2’ v1,v2) such that v 1 ’ ≥ v 1 v_1’ \geq v_1 v1v1 and v 2 ’ ≥ v 2 v_2 ’ \geq v_2 v2v2 implies ( v 1 ’ , v 2 ’ v_1’, v_2’ v1,v2) =( v 1 , v 2 v_1,v_2 v1,v2).

The TU-Problems

The TU Solution: If the players come to an agreement, then rationality implies that they will agree to play to achieve the largest possible total payoff, call it σ σ σ, σ = max ⁡ { ( a i j + b i j ) : i , j } σ =\max \{(a_{ij} + b_{ij}): i,j \} σ=max{(aij+bij):i,j} as the payoff to be divided between them.

Suppose now that he players have selected their threat strategies, say p for Player I and q for Player II. This point is in the NTU feasible set and is called the disagreement point or threat point.

Player I will accept no less than D 1 D_1 D1 and Player II will accept no less than D 2 D_2 D2 since these can be achieved if no agreement is reached.

They will split evenly the excess of σ \sigma σ over D 1 + D 2 D_1 + D_2 D1+D2

The resolution point is then ϕ = ( ϕ 1 , ϕ 2 ) = ( 1 2 ( σ + D 1 – D 2 ) , 1 2 ( σ − D 1 + D 2 ) ) ϕ = (ϕ_1, ϕ_2)= (\frac{1}{2} (σ+ D_1– D_2), \frac{1}{2} (σ- D_1 + D_2)) ϕ=(ϕ1,ϕ2)=(21(σ+D1D2),21(σD1+D2)) To select the threat optimally, Player I wants to maximize D 1 − D 2 D_1 − D_2 D1D2 and Player I wants to minimize it.

Let δ = V a l ( A − B ) = p ∗ T ( A − B ) q ∗ δ =Val(A − B) ={p^*}^T (A − B)q^* δ=Val(AB)=pT(AB)q, ϕ = ( 1 2 ( σ + δ ) , 1 2 ( σ − δ ) ) \phi = (\frac{1}{2}(\sigma+\delta),\frac{1}{2}(\sigma-\delta)) ϕ=(21(σ+δ),21(σδ))

All such disagreement points must be on the same 45 degrees line.

Nash Bargaing Model. A bargaing problem is given by ( S , u ∗ , v ∗ ) (S, u^*, v^*) (S,u,v) such that

  1. S is a compact (i.e. bounded and closed), convex set, in the plane.
    It is somewhat more general in that it does not have to be a polyhedral set. It could be a circle or an ellipse, for example.
  2. ( u ∗ , v ∗ ) ∈ S (u^∗, v^∗) \in S (u,v)S, is called the threat point, disagreement point or status-quo point.
    The bargaining players have to decide on a point in S. The status-quo point, ( u ∗ , v ∗ ) (u^∗, v^∗) (u,v), is the settlement when negotiation breaks down.

Given an NTU-feasible set, S, and a threat point or status-quo point, ( u ∗ , v ∗ ) ∈ S (u^∗ , v^∗) \in S (u,v)S, the problem is to find a point, ( u # , v # ) = f ( S , u ∗ , v ∗ ) (u^\# , v^\#) =f(S, u^∗ , v^∗) (u#,v#)=f(S,u,v) , to be considered a “fair and reasonable outcome” or “solution” of the game for an arbitrary compact convex set S and point ( u ∗ , v ∗ ) ∈ S (u^∗ , v^∗) \in S (u,v)S.

Nash Axioms for f ( S , u ∗ , v ∗ ) = ( u # , v # ) f(S, u^∗, v^∗) =(u^\# , v^\#) f(S,u,v)=(u#,v#).

  1. Feasibility. ( u # , v # ) ∈ S (u^\#, v^\#) \in S (u#,v#)S.
  2. Pareto Optimality. ( u # , v # ) (u^\# , v^\#) (u#,v#) is Pareto optimal in S.
  3. Symmetry. If S is symmetric about the 45 degrees line u = v u= v u=v,and if u ∗ = v ∗ u^∗ = v^∗ u=v, then u # = v # u^\# = v^\# u#=v#
  4. Independence of irrelevant alternatives. If T is a closed convex, subset of S, and if the threat point ( u ∗ , v ∗ ) ∈ T (u^∗ , v^∗) \in T (u,v)T and f ( S , u ∗ , v ∗ ) = ( u # , v # ) ∈ T f(S, u^∗ , v^∗) =(u^\# , v^\#) \in T f(S,u,v)=(u#,v#)T, then f ( T , u ∗ , v ∗ ) = ( u # , v # ) f(T, u^∗ , v^∗ ) =(u^\# , v^\# ) f(T,u,v)=(u#,v#).
  5. Invariance under change of location and scale. If T = { ( u _ , v _ ) : u _ = α 1 u + β 1 , v _ = α 2 v + β 2      f o r ( u , v ) ∈ S } T =\{(u\_,v\_):u\_=α_1u +β_1, v\_= α_2v +β_2\;\;for (u, v)\in S\} T={(u_,v_):u_=α1u+β1,v_=α2v+β2for(u,v)S} where α 1 > 0 , α 2 > 0 α_1> 0,α_2 > 0 α1>0,α2>0, β 1 β_1 β1 and β 2 β_2 β2 are given numbers, then f ( T , α 1 u ∗ + β 1 , α 2 v ∗ + β 2 ) = ( α 1 u # + β 1 , α 2 v # + β 2 ) f(T,α_1u^∗ + β_1, α_2v^∗ + β_2) =(α_1u^\#+ β_1, α_2v^\#+ β_2) f(T,α1u+β1,α2v+β2)=(α1u#+β1,α2v#+β2).

Theorem. There exist a unique function f satisfying the Nash axioms. Moreover, if there exists a point ( u , v ) ∈ S (u, v)\in S (u,v)S such that u > u ∗ u> u^∗ u>u and v > v ∗ v> v^∗ v>v , then f ( S , u ∗ , v ∗ ) f(S, u^∗ , v^∗) f(S,u,v) is that point of S that maximizes ( u − u ∗ ) ( v − v ∗ ) (u− u^* )(v −v^* ) (uu)(vv) among points of S such that u ≥ u ∗ u\geq u^∗ uu and v ≥ v ∗ v\geq v^∗ vv. It’s called Nash product.

Kali-Smorodinsky solution: For any bargaining problem ( S , ( d 1 , d 2 ) ) (S, (d_1, d_2)) (S,(d1,d2)), define I 1 = max ⁡ { u : f o r    s o m e    v , ( u , v ) ∈ S , d 1 ≤ u , d 2 ≤ v } I_1 = \max \{u: for \;some \;v, (u,v)\in S, d_1\leq u, d_2\leq v \} I1=max{u:forsomev,(u,v)S,d1u,d2v}, I 2 = max ⁡ { v : f o r    s o m e    u , ( u , v ) ∈ S , d 1 ≤ u , d 2 ≤ v } I_2 = \max \{v: for\; some\; u,(u, v)\in S, d_1 \leq u, d_2 \leq v \} I2=max{v:forsomeu,(u,v)S,d1u,d2v} The Kali-Smorodinsky solution ( u 1 , u 2 ) (u_1, u_2) (u1,u2) is on the Pareto optimal boundary such that ( u 1 – d 1 ) / ( I 1 – u 1 ) = ( u 2 – d 2 ) / ( I 2 – u 2 ) (u_1– d_1)/(I_1– u_1) =(u_2– d_2) /(I_2– u_2) (u1d1)/(I1u1)=(u2d2)/(I2u2)

Zeuthen’s Principle

Suppose that, at a given stage of the bargaining process, Player I’s last offer corresponded to the payoff vector v = ( v 1 , v 2 ) v=(v_1 , v_2) v=(v1,v2) whereas Player II’s last offer corresponded to the payoff vector $w=(w_1 , w_2) $ such that v and w belong to the Pareto Optimal boundary of the cooperative feasible set, with v 1 > w 1 v_1>w_1 v1>w1 but v 2 < w 2 v_2<w_2 v2<w2. Which player will have to make the next concession?

If Player I accepts his opponent’s last offer w, then he will get payoff w 1 w_1 w1. On the other hand, if he insists on his own last offer v, then he will obtain either payoff v 1 v_1 v1 or will obtain the disagreement point payoff d 1 d_1 d1 i.e Player II sticks to his last offer.

Suppose Player I assigns probability p to the later possibility and assigns probability (1-p) to the former possibility. Then, he must conclude that by insisting on his last offer v and by making no further concession he will get the expected payoff ( 1 − p ) v 1 + p d 1 (1-p)v_1+pd_1 (1p)v1+pd1, whereas by being conciliatory and accepting Player II’s last offer w he will get w 1 w_1 w1 Therefore, Player I can rationally stick to his last offer only if ( 1 − p ) v 1 + p d 1 ≥ w 1 (1-p) v_1 + pd_1 \geq w_1 (1p)v1+pd1w1 i.e. if p ≤ ( v 1 − w 1 ) / ( v 1 − d 1 ) = r 1 p \leq (v_1 -w_1 ) /(v_1-d_1) =r_1 p(v1w1)/(v1d1)=r1

We will call r 1 r_1 r1, r 2 r_2 r2 the two players’ risk limits expressed as ratio of difference in payoffs.

Zeuthen suggests that the next concession must always come from the player with a smaller risk limit

Theorem: If the two players follow Zeuthen’s Principle then the next concession will always made by the player whose last offer is associated with a smaller Nash product (unless both are associated with equal Nash product, in which case both of them have to make concessions).

Games in Coalitional Form

A coalition, S, is defined to be a subset of N, S ⊂ N S\subset N SN, and the set of al coalitions is denoted by 2N. We also speak of the empty set, Ø, as a coalition, the empty coalition. The set N is also a coalition, called the grand
coalition.

The coalitional form of a n-person game is given by the pair (N, v), where v is a real-valued function, called the characteristic function of the game, defined on the set, 2N, of all coalitions (subsets of N), and satisfying

  1. v(Ø) =0
  2. (superadditivity) if S and T are disjoint coalitions (S ∩ T= Ø), then v(S) +v(T) ≤v(S ∪ T).

We define v(S) for each S∈ 2N as the value of the 2-person zero-sum game obtained when the coalition S acts as one player and the complementary coalition, SC = N −S, acts as the other player, and where the payoff to S is ∑ i ∈ S u i ( x 1 . . . x n ) \sum_{i\in S}u_i(x_1...x_n) iSui(x1...xn) the total of the payoffs to the players in S

The value, v(S), is the analogue of the safety level. It represents the total amount that coalition S can guarantee for itself.v is a characteristic function.

A game in coalitional form is said to be constant sum, if v(S)+v(SC) =v(N) for all coalitions S∈ 2N. It is said to be zero-sum if, in addition, v(N) =0.

A payoff vector, x = ( x 1 , x 2 , . . , x n ) x = (x_1, x_2, . ., x_n) x=(x1,x2,..,xn), is said to be group rational or efficient if x 1 + … + x n = v ( N ) x_1+…+x_n = v(N) x1++xn=v(N).

A payoff vector, x, is said to be individually rational if x i ≥ v ( { i } ) x_i ≥ v(\{i\}) xiv({i})

An imputation is a payoff vector that is group rational and individually rational.

A game in coalitional form is said to be inessential if ∑ i v ( { i } ) = v ( N ) \sum_iv(\{i\})=v(N) iv({i})=v(N), and essential if ∑ i v ( { i } ) ≠ v ( N ) \sum_iv(\{i\})\not=v(N) iv({i})=v(N).

An imputation x is said to be unstable through a coalition S if v ( S ) > ∑ i ∈ S x i v(S) >\sum_{i∈S}x_i v(S)>iSxi

The set, C, of stable imputations is called the core, C = { x = ( x 1 , . . , x n ) : ∑ i ∈ N x i = v ( N ) a n d ∑ i ∈ S x i ≥ v ( S ) , f o r a l l S ⊂ N } C =\{x = (x_1, . ., x_n) :\sum_{i∈N} x_i = v(N) and \sum _{i∈S}x_i \geq v(S), for all S\subset N\} C={x=(x1,..,xn):iNxi=v(N)andiSxiv(S),forallSN}.

The core of an essential n-person constant-sum game is empty.

Shapley Value

The Shapley Value is to assign to each game in coalitional form a unique vector of payoffs, called the value. The ith entry of the value vector may be considered as a measure of the value or power of the ith player in the game.

φ i ( v ) φ_i(v) φi(v) represents the worth or value of player in the game with characteristic function v. The axioms of fairness are placed on the function, φ φ φ.

The Shapley Axioms for φ(v):

  1. Efficiency. ∑ i φ i ( v ) = v ( N ) \sum_i φ_i(v) =v(N) iφi(v)=v(N).
  2. Symmetry. If i and j are such that v ( S ∪ { i } ) = v ( S ∪ { j } ) v(S∪\{i\}) =v(S∪\{j\}) v(S{i})=v(S{j}) for every coalition S not containing i and j, then φ i ( v ) = φ j ( v ) φ_i (v) =φ_j (v) φi(v)=φj(v).
  3. Dummy Axiom. If i is such that v ( S ) = v ( S ∪ { i } ) v(S) =v(S ∪\{i\}) v(S)=v(S{i}) for every coalition S not containing i, then φ i ( v ) = 0 φ_i (v) =0 φi(v)=0.
  4. Additivity. If u and v are characteristic functions, then φ ( u + v ) = φ ( u ) + φ ( v ) φ(u +v) =φ(u) + φ(v) φ(u+v)=φ(u)+φ(v).

Example: Let S ⊂ N S\subset N SN Define a characteristic function w S w_S wS such that w S ( T ) = 1 w_S (T)= 1 wS(T)=1if S ⊂ T S\subset T ST w S ( T ) = 0 w_S (T)= 0 wS(T)=0 otherwise. Find φ ( w S ) \varphi(w_S) φ(wS)

Solution : φ i ( w S ) = 0 \varphi_i(w_S)=0 φi(wS)=0 for i ∉ S i\not \in S iS, φ i ( w S ) = 1 ∣ S ∣ \varphi_i(w_S)=\frac{1}{|S|} φi(wS)=S1 for i ∈ S i \in S iS

Theorem : There exist a unique function φ satisfying the Shapley axioms.

Claim : Any v may be written as v = ∑ S ⊆ N c S w S v= \sum_{S⊆N}c_Sw_S v=SNcSwS

c T = v ( T ) − S ⊊ T c S c_T = v(T) −S⊊ Tc_S cT=v(T)STcS

The Shapley value is just he average payoff to the players if the players are entered in completely random order.

Theorem: The following value satisfies the Shapley axioms. φ = ( φ 1 , . . , φ n ) φ = (φ_1 , . ., φ_n) φ=(φ1,..,φn), where for i = 1 , . . , n i= 1, . ., n i=1,..,n, φ i ( v ) = ∑ S ⊆ N , i ∈ S [ ( ∣ S ∣ − 1 ) ! ( n − ∣ S ∣ ) ! ] [ v ( S ) − v ( S − { i } ) ] / n ! φ_i(v) =\sum_{S⊆N,i∈S}[(|S|−1)!(n−|S|)!] [v(S)−v(S −\{i\})]/ n! φi(v)=SN,iS[(S1)!(nS)!][v(S)v(S{i})]/n! .

The Shapley value doesn’t necessarily lie in the core if the core is not empty.

Definition. A game ( N , v ) (N, v) (N,v) is simple if for every coalition S ⊂ N S\subset N SN , either v ( S ) = 0 v(S) =0 v(S)=0 or v ( S ) = 1 v(S) =1 v(S)=1.

φ i ( v ) = ∑ S    w i n n i n g , S − { i }    l o s i n g ( ∣ S ∣ − 1 ) ! ( n − ∣ S ∣ ) ! / n ! φ_i(v) =\sum_{S\; winning ,S−\{i\}\; losing} (|S| −1)!(n −|S|)!/n! φi(v)=Swinning,S{i}losing(S1)!(nS)!/n!

The Shapley value individually rational, i.e is φ i ( v ) ≥ v ( { i } ) φ_i(v)≥v(\{i\}) φi(v)v({i})

Strong Monotonicity Axiom: A value function φ \varphi φ is said to be strongly monotonic if for any two set functions v, w such that whenever v ( S ∪ { i } ) – v ( S ) ≥ w ( S ∪ { i } ) – w ( S ) v(S\cup \{i\}) – v(S) \geq w(S\cup \{i\}) –w(S) v(S{i})v(S)w(S{i})w(S) for any S not containing i, we must have φ i ( v ) ≥ φ i ( w ) \varphi_i(v)\geq \varphi_i(w) φi(v)φi(w)

Theorem: (P. Young): Shapley value is the unique set function satisfying the Efficiency Axiom, Symmetry Axiom, Dummy Axiom and Strong Monotonicity Axiom.

The Nucleolus

Given characteristic function, v, and try to find an imputation x = ( x 1 , . . , x n ) x =(x_1 , . ., x_n) x=(x1,..,xn) that minimizes the worst inequity. That is, we ask each coalition S how dissatisfied its with the proposed imputation x and we try to minimize the maximum dissatisfaction.

Definition of Excess: As a measure of the inequity of an imputation x for a coalition S is defined as the excess, e ( x , S ) = v ( S ) − ∑ j ∈ S x j e(x, S) =v(S) −\sum_{j∈S}x_j e(x,S)=v(S)jSxj

When the largest excess has been made as small as possible, we concentrate on the next largest excess, and adjust x to make it as small as possible, and so on. When we cannot improve further, the imputation we get is called the nucleolus.

The Bankruptcy Game : (O’Niel (1982)) A small company goes bankrupt owing money to three creditors. The
company owes creditor A $10000 ,creditor B$20000 ,and creditor C$30000. If the company has only $36000 to cover these debts, how should the money be divided among the creditors?

A pro rata split of the money would lead to the allocation of $6000 for A, $12000 for B, and $18000 for C, denoted by x = (6, 12, 18) in thousands of dollars.

The first step of finding the nucleolus is to find a vector x = ( x 1 , . . , x n ) x = (x_1 , . ., x_n) x=(x1,..,xn) that minimizes the maximum of the excesses e ( x , S ) e(x,S) e(x,S) over all S subject to ∑ x j = v ( N ) \sum x_j = v(N) xj=v(N). This problem of minimizing the maximum of a collection of linear functions subject to a linear constraint is easily converted to a linear programing problem and can thus be solved by the simplex method, for example. After this done, one may have to solve a second linear programing problem to minimize the next largest excess, and so on.

Theorem : The nucleolus of a game in coalitional form exist and is unique. The nucleolus is group rational, individually rational, and satisfies the symmetry axiom and the dummy axiom. If the core is not empty, the nucleolus in the core.

The Marriage Game

Definition: A matching W is said to be stable if no boy and girl not matched in W prefer each other to their W-mates.

Deferred Acceptance Algorithm

Definition: Call a boy (girl) feasible for a girl (boy) if there exists a stable matching in which they are matched.

Theorem: In a girl-propose algorithm, no girl is ever rejected by a boy feasible for her. Similarly, in a boy-propose algorithm, no boy is ever rejected by a girl feasible for him.

Corollary: In the girl-propose algorithm, the girls are matched to optimal feasible boys. Therefore, girl-propose algorithm is girl-optimal.

The College Admission Game

One-sided Matching

David Gale’s Top Trading Cycle algorithm