Dynamic Games

2021-10-29

Introduction

Intro

Setting: agents making strategic decisions (new) in dynamic environments.

Entry and exit: Collard-Wexler (2013)
Sunk costs: Ryan (2012)
Innovation: Goettler and Gordon (2011)
- (or whatever changes in response to investment)
Exploitation of natural resources: Huang and Smith (2014)
Durable goods: Esteban and Shum (2007)

Lit review: forthcoming IO Handbook chapter Aguirregabiria, Collard-Wexler, and Ryan (2021)

Single- vs Multi-Agent

Typically in IO we study agents in strategic environments. Complicated in dynamic environments.

Curse of dimensionality
- Single agent: need to track what the agent sees (\(k\) states)
- Multi-agent: need to keep track what every agent sees (\(k^J\)states)
- Difference exponential in the number of agents
Expectations
- Need not only to keep track of how the environment evolves
- … but also of how other players act
Equilibrium
- Because of the strategic interaction, the Bellman equation is not a contraction anymore
  - Equilibrium existence?
  - Equilibrium uniqueness?

Plan

We will cover first the estimation and then the computation of dynamic games

Weird…
Standard estimation method: Bajari, Benkard, and Levin (2007)
Does not require to solve the model
Indeed, that’s the advantage of the method
Disadvantages: still need to solve the model for counterfactuals
So we’ll cover computation afterwards

Last: bridge between Structural IO and Artificial Intelligence

Different objectives but similar methods
Dynamic tools niche in IO but at the core of AI

Bajari, Benkard, Levin (2008)

Model

Stylized version of Ericson and Pakes (1995) (no entry/exit)

\(J\) firms (products) indexed by \(j \in \lbrace 1, ..., J \rbrace\)
Time \(t\) is dicrete, horizon is infinite
States \(s_{jt} \in \lbrace 1, ... \bar s \rbrace\): quality of product \(j\) in period \(t\)
Actions \(a_{jt} \in \mathbb R^+\): investment decision of firm \(j\) in period \(t\)
Static payoffs \[ \pi_j (s_{jt}, \boldsymbol s_{-jt}, a_{jt}; \theta^\pi) \] where
- \(\boldsymbol s_{-it}\): state vector of all other firms in period \(t\)
- \(\theta^\pi\): parameters that govern static profits

Note: if we micro-fund \(\pi(\cdot)\) , e.g. with some demand and supply model, we have 2 strategic decisions: prices (static) and investment (dynamic).

Model (2)

State transitions \[ \boldsymbol s_{t+1} = f(\boldsymbol s_t, \boldsymbol a_t, \boldsymbol \epsilon_t; \theta^f) \] where
- \(\boldsymbol a_t\): vector of actions of all firm
- \(\boldsymbol \epsilon_t\): vector of idiosyncratic shocks
- \(\theta^f\): parameters that govern state transitions
Objective function: firms maximize expected discounted future profits \[ \max_{\boldsymbol a} \ \mathbb E_t \left[ \sum_{\tau=0}^\infty \beta^{\tau} \pi_{j, t+\tau} (\theta^\pi) \right] \]

Value Function

The value function of firm \(j\) at time \(t\) in state \(\boldsymbol s_{t}\), under a set of strategy functions \(\boldsymbol P\) (one for each firm) is \[ V^{\boldsymbol P_{-j}}_{j} (\mathbf{s}_{t}) = \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Bigg\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + \beta \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}\right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace \] where

\(\pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi )\) are the static profits of firm \(j\) given action \(a_{jt}\) and policy functions \(\boldsymbol P_{-j}\) for all firms a part from \(j\)
The expecation \(\mathbb E\) is taken with respect to the conditional transition probabilities \(f^{\boldsymbol P_{-j}} (\mathbf{s}_{t+1} | \mathbf{s}_{t}, a_{jt} ; \theta^f)\)

Equilibrium

Equillibrium notion: Markow Perfect Equilibrium (Maskin and Tirole 1988)

Assumption: players’ strategies at period \(t\) are functions only of payoff-relevant state variables at the same period
Definition: a set of \(J\) value and policy functions, \(\boldsymbol V\) and \(\boldsymbol P\) such that each firm
1. maximizes its value function \(V_j\)
2. given the policy function of every other firm \(\boldsymbol P_{-j}\)

What is it basically?

Nash Equilibrium in the policy functions
What are we ruling out?
- Strategies that depend on longer histories
- E.g. “has anyone ever cheated in a cartel?”

Estimation

We want to estimate 2 sets of parameters:

\(\theta^\pi\): parameterizes period profit function \(\pi(\cdot)\)
\(\theta^f\): parameterizes state transition function \(f(\cdot)\)

Generally 2 approaches

Full solution
- Impractical (we’ll see more details later)
Rely on some sort of Hotz and Miller (1993) CCP inversion
- Aguirregabiria and Mira (2007)
- Bajari, Benkard, and Levin (2007)
- Pakes, Ostrovsky, and Berry (2007)
- Pesendorfer and Schmidt-Dengler (2008)

BBL Overview

Bajari, Benkard, and Levin (2007) plan

Estimate transition probabilities and conditional choice probabilities from the data
Use them to simulate the expected value function, given a set of parameters
Use optimality of estimated choices to pin down static profit parameters
- I.e. repeat (2) for alternative strategies
  - By definition suboptimal
- Estimating equation: values implied by observed strategies should be higher than values implied by alternative strategies

BBL: First Stage

Estimate the transition probabilities \(f ( \cdot | a_{jt}, \boldsymbol s_t; \hat \theta^f )\)
- I.e. what is the observed frequency of any state-to-state transition?
- For any given action of firm \(j\)
… and conditional choice probabilities \(\hat P_j(\cdot | \boldsymbol s_t)\)
- I.e. what is the probability of each action, for each firm \(j\) in each state \(\boldsymbol s\)
Can be done non-parametrically
- i.e. just observe frequencies
- Conditional on having enough data
- Note: need to estimate transitions, conditional on each state and action
- Problem with many states and actions, but especially with many players
  - Curse of dimensionality
  - Number of states increases exponentially in number of players

Important: parametric assumptions would contradict the model for the estimation of value/policy functions

BBL: Second Stage

First step: from transitions \(f(\hat \theta^f)\) and CCPs \(\boldsymbol{\hat P}\) to values

We can use transitions and CCPs to simulate histories (of length \(\tilde T\))
- of states \(\lbrace \boldsymbol{\tilde{s}_{\tau}} \rbrace_{\tau = 1}^{\tilde T}\)
- and actions \(\lbrace \boldsymbol{\tilde{a}_{\tau}} \rbrace_{\tau = 1}^{\tilde T}\)
Given a parameter value \(\tilde \theta^\pi\), we can compute static payoffs: \(\pi_{j}^{\boldsymbol {\hat{P}_{-j}}} \left( \tilde a_{j\tau}, \boldsymbol{\tilde s}_{\tau} ; \tilde \theta^\pi \right)\)
Simulated history + static payoffs = simulated value function \[ {V}_{j}^{\boldsymbol {\hat{P}}} \left(\boldsymbol{s}_{t} ; \tilde \theta^\pi \right) = \sum_{\tau=0}^{\tilde T} \beta^{\tau} \pi_{j}^{\boldsymbol {\hat{P}_{-j}}} \left( \tilde a_{j\tau}, \boldsymbol{\tilde s}_{\tau} ; \tilde \theta^\pi \right) \]
We can average over many, e.g. \(R\), simulated value functions to get an expected value function \[ {V}_{j}^{\boldsymbol {\hat{P}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) = \frac{1}{R} \sum_{r=0}^{R}\Bigg( \sum_{\tau=0}^{\tilde T} \beta^{\tau} \pi_{j}^{\boldsymbol {\hat{P}_{-j}}} \left(\tilde a^{(r)}_{j\tau}, \boldsymbol{\tilde s}^{(r)}_{\tau} ; \tilde \theta^\pi \right) \Bigg) \]

In practice, for a parameter value \(\tilde \theta^\pi\)

For \(r = 1, ..., R\) simulations do:

Initialize firms value to zero
Fot \(\tau=0, ..., \tilde T\) do
- For each state in \(\boldsymbol{\tilde s}^{(r)}_{\tau}\) do:
  - Use \(\boldsymbol{\hat P}\) to draw a vector of firm actions \(\boldsymbol{\tilde a}^{(r)}_{\tau}\)
  - For each firm \(j = 1, ..., J\) do:
    - Compute static profits \(\pi_{j}^{\boldsymbol {\hat{P}_{-j}}} \left(\tilde a^{(r)}_{j\tau}, \boldsymbol{\tilde s}^{(r)}_{\tau} ; \tilde \theta^\pi \right)\)
    - Add discounted profits \(\beta^{\tau} \pi_{j}^{\boldsymbol {\hat{P}_{-j}}} \left(\tilde a^{(r)}_{j\tau}, \boldsymbol{\tilde s}^{(r)}_{\tau} ; \tilde \theta^\pi \right)\) to the value function
  - Use \(f ( \cdot | \boldsymbol {a_{t}}, \boldsymbol s_t; \hat \theta^f )\) to draw the next state \(\boldsymbol{\tilde s}^{(r)}_{\tau + 1}\)
  - Use the next state, \(\boldsymbol{\tilde s}^{(r)}_{\tau + 1}\) as current state for the next iteration

Then average all the value functions together to obtain an expected value function \(V_{j}^{\boldsymbol {\hat{P}}, R} \left(\boldsymbol{s}_{t} ; \tilde \theta^\pi \right)\)

Note: advantage of simulations: can be parallelized

Objective Function

What have we done so far?

Given some parameters \(\theta^\pi\), we computed the expected value function

How do we pick the \(\theta^\pi\) that best rationalizes the data?

I.e. what is the objective function?
Potentially many options

BBL idea

the expected value function has to be optimal, given the CCPs
I.e. any other policy function should give a lower expected value
“Best” \(\theta^\pi\): those for which the implied expected value function under the estimated CCPs is greater than the one implied by any other CCP

Note: it’s an inequality statement

Objective Function (2)

Idea

If the observed policy \({\color{green}{\boldsymbol{\hat P}}}\) is optimal,
- All other policies \({\color{red}{\boldsymbol{\tilde P}}}\)
- … at the true parameters \(\theta^f\)
- … should give a lower expected value \[ V_{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \leq V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \]
So which are the true parameters?
- Those for which any deviation from the observed policy \({\color{green}{\boldsymbol{\hat P}}}\) yields a lower value
- Objective function to minimize: violations under alternative policies \({\color{red}{\boldsymbol{\tilde P}}}\) \[ \min_{\tilde \theta^\pi} \sum_{\boldsymbol s_{t}} \sum_{{\color{red}{\boldsymbol{\tilde P}}}} \Bigg[\min \bigg\lbrace V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) - V_{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \ , \ 0 \bigg\rbrace \Bigg]^{2} \]

Estimator

Estimator: \(\theta^\pi\) that minimizes the average (squared) magnitude of violations for any alternative policy \({\color{red}{\boldsymbol{\tilde P}}}\) \[ \hat{\theta}^\pi= \arg \min_{\tilde \theta^\pi} \sum_{\boldsymbol s_{t}} \sum_{{\color{red}{\boldsymbol{\tilde P}}}} \Bigg[\min \bigg\lbrace V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) - V_{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \ , \ 0 \bigg\rbrace \Bigg]^{2} \]

\(\min \Big\lbrace V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} - V_j^{{\color{red}{\boldsymbol{\tilde P}}}, R} \ , \ 0 \Big\rbrace\) to pick only the violations
- If \({\color{green}{\boldsymbol{\hat P}}}\) implies higher value, we can ignore
- Doesn’t matter by how much you respect the inequality
Which alternative policies \({\color{red}{\boldsymbol{\tilde P}}}\) should we use?
- In principle, any perturbation is ok
- But in practice, if we perturbe it too much, we can go too far off
- Tip 1: start with very small perturbations
- Tip 2: use perturbation that sensibly affect the dynamics
  - E.g. exiting in a state in which a firm is not a competitive threat
- Tip 3: use perturbations on dimensions that are relevant for the research question
  - E.g. they affect dimensions where you want to make counterfactual predictions

Advantages

We have seen that there are competing methods.

What are the advantages of Bajari, Benkard, and Levin (2007) over those?

Continuous actions
- BBL does not require actions to be discretised
- You can just sample actions from the data!
Choice of alternative CCPs
- The researcher is free to choose the alternative CCPs \({\color{red}{\boldsymbol{\tilde P}}}\)
- Pros: can make source of variation more transparent
  - allows the researcher to focus on those predictions of the model that are key for the specific research questions
- Cons: it’s a very high dimensional space
  - There are very very many alternative policy functions

Problems

Computational curse of dimensionality is gone (in the state space)
- But we have a curse of dimensionality in data
- Need a lot of markets because now 1 market is 1 observation
Multiple equilibria??
- We are basically assuming it away
- Estimating the CCPs in the first stage we assume that is the equilibrium that is played in all markets at all times
- To run counterfactuals, we still need to solve the model
Unobserved heterogeneity
- Kasahara and Shimotsu (2009): how to identify the (minimum) number of unobserved types
- Arcidiacono and Miller (2011): how to use an EM algorithm for the 1st stage estimation with unobserved types, conditional on the number of types
- Berry and Compiani (2021): instrumental variables approach, relying on observed states in the distant past
Non-stationarity
- If we have a long time period, something fundamentally might have changed

Ericson Pakes (1995)

Introduction

Ericson and Pakes (1995) and companion paper Pakes and McGuire (1994) for the computation

\(J\) firms indexed by \(j \in \lbrace 1, ..., J \rbrace\)
Time \(t\) is dicrete \(t\), horizon is infinite
State \(s_{jt}\): quality of firm \(j\) in period \(t\)
Per period profits \[ \pi (s_{jt}, \boldsymbol s_{-jt}, ; \theta^\pi) \] where
- \(\boldsymbol s_{-it}\): state vector of all other firms in period \(t\)
- \(\theta^\pi\): parameters that govern static profits
We can micro-fund profits with some demand and supply functions
- There can be some underlying static strategic interaction
- E.g. logit demand and bertrand competition in prices \(p_{it}\)

State Transitions

Investment: firms can invest an dollar amount \(x\) to increase their future quality

Continuous decision variable (\(\neq\) Rust)
Probability that investment is successful \[ \Pr \big(i_{jt} \ \big| \ a_{it} = x \big) = \frac{\alpha x}{1 + \alpha x} \]
Higher investment, higher success probability
\(\alpha\) parametrizes the returns on investment

Quality depreciation

With probability \(\delta\), quality decreases by one level

Law of motion \[ s_{j,t+1} = s_{jt} + i_{jt} - \delta \]

Decision Variables

Note that in Ericson and Pakes (1995) we have two separate decision variables

Static decision variable: price \(p_{jt}\)
Dynamic decision variable: investment \(i_{jt}\)

Does not have to be the case!

Example: Besanko et al. (2010)

Model of learning-by-doing: firms decrease their marginal cost through sales
State variable: firm stock of know how \(e\)
- The higher the stock of know-how, the lower the marginal cost
- Increases when a firm manages to make a sale
  - \(q \in [0,1]\) now is both static quantity and transition probability
Single decision variable: price \(p\)
- Usual static effects on profits \(\pi_{jt} = (p_{jt} - c(e_{jt})) \cdot q_j(\boldsymbol p_t)\)
- But also dynamic effect through transition probabilities
  - Probability of increasing \(e_t\): \(q_j(\boldsymbol p_t)\)

Equilibrium

Firms maximize the expected flow of discounted profits \[ \max_{\boldsymbol a} \ \mathbb E_t \left[ \sum_{\tau=0}^\infty \beta^{\tau} \pi_{j, t+\tau} (\theta^\pi) \right] \] Markow Perfect Equilibrium

Equillibrium notion: Markow Perfect Equilibrium (Maskin and Tirole 1988)

A set of \(J\) value and policy functions, \(\boldsymbol V\) and \(\boldsymbol P\) such that each firm
1. maximizes its value function \(V_j\)
2. given the policy function of every other firm \(\boldsymbol P_{-j}\)

Exit

One important extension is exit.

In each time period, incuments decide whether to stay
… or exit and get a scrap value \(\phi^{exit}\)

The Belman Equation of incumbent \(j\) at time \(t\) is \[ V^{\boldsymbol P_{-j}}_{j} (\mathbf{s}_{t}) = \max_{d^{exit}_{jt} \in \lbrace 0, 1 \rbrace} \Bigg\lbrace \begin{array}{c} \beta \phi^{exit} \ , \newline \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Big\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + \beta \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}\right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Big\rbrace \end{array} \Bigg\rbrace \] where

\(\phi^{exit}\): exit scrap value
\(d^{exit}_{jt} \in \lbrace 0,1 \rbrace\): exit decision

Entry

We can also incorporate endogenous entry.

One or more potential entrants exist outside the market
They can pay an entry cost \(\phi^{entry}\) and enter the market at a quality state \(\bar s\)
… or remain outside at no cost

Value function \[ V_{j}^{\boldsymbol P_{-j}} (e, \boldsymbol x_{-jt} ; \theta) = \max_{d^{entry} \in \lbrace 0,1 \rbrace } \Bigg\lbrace \begin{array}{c} 0 \ ; \newline - \phi^{entry} + \beta \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} (\bar s, \boldsymbol s_{-j, t+1} ; \theta) \ \Big| \ \boldsymbol s_{t} ; \theta^f \Big] \end{array} \Bigg\rbrace \] where

\(d^{entry} \in \lbrace 0,1 \rbrace\): entry decision
\(\phi^{entry}\): entry cost
\(\bar s\): state in which entrants enters (could be random)

Do we observe potential entrants?

Igami (2017): tech industry announce their entry
Critique: not really potential entrants, they are half-way inside

Equilibrium Existence

Doraszelski and Satterthwaite (2010): a MPE might not exist in Ericson and Pakes (1995) model.

Solution

Replace fixed entry costs \(\phi^{entry}\) and exit scrap values \(\phi^{exit}\) with random ones
It becomes a game of incomplete information
- First explored in Rust (1994)
New equilibrium concept

Markov Perfect Bayesian Nash Equilibrium (MPBNE)

Basically the same, with rational beliefs on random variables

Solving the Model

Solving the model is very similar to Rust

Given parameter values \(\theta\)
Start with a guess for the value and policy functions
Until convergence, do:
- For each firm \(j = 1, ..., J\), do:
  - Take the policy functions of all other firms
  - Compute the implied transition probabilities
  - Use them to compute the new policy function for firm \(j\)
  - Compute the implied value function

Where do things get complicated / tricky? Policy function update

Policy Update Example: exit game

Imagine a stylized exit game with 2 firms

Easy to get an update rule of the form: “exit if opponent stays, stay if opponent exits”

Computationally

Initialize policy functions to \((exit, exit)\)
Iteration 1:
- Each firm takes opponent policy as given: \(exit\)
- Update own optimal policy: \(stay\)
- New policy: \((stay, stay)\)
Iteration 2: \((stay, stay) \to (exit, exit)\)
Iteration 2: \((exit, exit) \to (stay, stay)\)
Etc…

Issues: value function iteration might not converge and equilibrium multeplicity.

Convergence Tips

Try different starting values
- Often it’s what makes the biggest difference
- Ideally, start as close as possible to true values
- Approximation methods can help (we’ll see more later)
  - I.e. get a fast approximation to use as starting vlaue for solution algorithm
Partial/stochastic value function update rule
- Instead of \(V' = T(V)\), use \(V' = \alpha T(V) + (1-\alpha)V\)
- Very good to break loops, especially if \(\alpha\) is stochastic, e.g. \(\alpha \sim U(0,1)\)
How large is the support of the entry/exit costs?
- If support is too small, you end up back in the entry/exit loop
Try alternative non-parallel updating schemes
- E.g. update value one state at the time (in random order?)
Last but not least: change the model
- In particular, from simultaneous to alternating moves
- or continuous time

Multiple Equilibria

How to find them?

Besanko et al. (2010) and Borkovsky, Doraszelski, and Kryukov (2010): homotopy method
- can find some equilibria, but not all
- complicated to implement: need to compute first order conditions \(H(\boldsymbol V, \theta) = 0\) and their Jacobian \(\Delta H(\boldsymbol V, \theta)\)
- Idea: trace the equilibrium correspondence \(H^{-1} = \lbrace (\boldsymbol V, \theta) : H(\boldsymbol V, \theta) = 0 \rbrace\) in the value-parameter space
Eibelshäuser and Poensgen (2019)
- Markov Quantal Response Equilibrium
- approact dynamic games from a evolutionary game theory perspective
  - actions played at random and those bringing highest payoffs survive
- \(\to\) homothopy method guaranteed to find one equilibrium
Pesendorfer and Schmidt-Dengler (2010): some equilibria are not Lyapunov-stable
- BR iteration cannot find them unless you start exactly at the solution
Su and Judd (2012) and Egesdal, Lai, and Su (2015): same point, but numerically
- using MPEC approach

Multiple Equilibria (2)

Can we assume them away?

Igami (2017)
- Finite horizon
- Homogenous firms (in profit functions and state transitions)
- One dynamic move per period (overall, not per-firm)
Abbring and Campbell (2010)
- Entry/exit game
- Homogeneous firms
- Entry and exit decisions are follow a last-in first-out (LIFO) structure
  - “An entrant expects to produce no longer than any incumbent”
Iskhakov, Rust, and Schjerning (2016)
- can find all equilibria, but for very specific class of dynamic games
- must always proceed “forward”
  - e.g. either entry or exit but not both
- Idea: can solve by backward induction even if horizon is infinite

Curse of Dimensionality

What are the computational bottlenecks? \[ V^{\boldsymbol P_{-j}}_{j} ({\color{red}{\mathbf{s}_{t}}}) = \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Bigg\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + \beta \mathbb E_{{\color{red}{\mathbf{s}_{t+1}}}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}\right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace \]

Dimension of the state space
- In single agent problems, we have as many states as many values of \(s_{jt}\) (\(k\))
- In dynamics games, the state space goes from \(k\) to \(k^J\)
- symmetry helps: state \([1,2,3]\) and \([1,3,2]\) become the same for firm 1
- How much do we gain? From \(k^J\) to \(k \cdot {k + J - 2 \choose k - 1}\)
Dimension of the integrand
- If in single agent problems, we have to integrate over \(\kappa\) outcomes,
  - 4 in Rust: engine replaced (yes|no) \(\times\) mileage increases (yes|no)
- … in dynamic games, we have to consider \(\kappa^J\) outcomes

Note: bottlenecks are not addittive but multiplicative: have to solve the expectation for each point in the state space. Improving on any of the two helps a lot.

Curse of Dimensionality (2)

Two and a half classes of solutions:

Computational: approximate the equilibrium
- Doraszelski (2003): use Chebyshev polynomials for a basis function
- Farias, Saure, and Weintraub (2012): combine approximations with a MPEC-like approach
Conceptual: define another game
- Weintraub, Benkard, and Van Roy (2008): oblivious equilibrium
- Ifrach and Weintraub (2017): moment based equilibrium
- Doraszelski and Judd (2012): games in continuous time
- Doraszelski and Judd (2019): games with random moves
Kind of both: Pakes and McGuire (2001)
- experience-based equilibrium (Fershtman and Pakes 2012)

Note: useful also to get good starting values for a full solution method!

Oblivious Equilibrium

Weintraub, Benkard, and Van Roy (2008): what if firms had no idea about the state of other firms?

or atomistic firms

The value function becomes \[ V_{j} ({\color{red}{s_{t}}}) = \max_{a_{jt} \in \mathcal{A}_j \left({\color{red}{s_{t}}}\right)} \Bigg\lbrace {\color{red}{\mathbb E_{\boldsymbol s_t}}} \Big[ \pi_{j} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) \Big| P \Big] + \beta \mathbb E_{{\color{red}{s_{t+1}}}} \Big[ V_{j} \left({\color{red}{s_{t+1}}}\right) \ \Big| \ a_{jt}, {\color{red}{s_{t}}} ; \theta^f \Big] \Bigg\rbrace \]

Now the state is just \(s_t\) instead of \(\boldsymbol s_t\)
- Huge computational gain: from \(k^J\) points to \(k\)
- Also the expectation of future states is taken over \(3\) instead of \(3^J\) points
  - (3 because quality can go up, down or stay the same)
But need to compute static profits as the expected value given the current policy function
- Need to keep track of the asymptotic state distribution as you iterate the value

Games with Random Moves

Doraszelski and Judd (2019): what if instead of simultaneously, firms would move one at the time at random?

Important: to have the same frequency of play, a period now is \(J\) times shorter

The value function becomes \[ V^{\boldsymbol P_{-j}}_{j} (\mathbf{s}_{t}, {\color{red}{n=j}}) = \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Bigg\lbrace {\color{red}{\frac{1}{J}}}\pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + {\color{red}{\sqrt[J]{\beta}}} \mathbb E_{{\color{red}{n, s_{j, t+1}}}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}, {\color{red}{n}} \right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace \]

\(n\): indicates whose turn is to play
since a turn is \(J\) times shorter, profits are \(\frac{1}{J} \pi\) and discount factor is \(\sqrt[J]{\beta}\)

Computational gain

Expectation now taken over \(n, s_{j, t+1}\) instead of \(\boldsymbol s_{t+1}\)
I.e. \(Jk\) points instead of \(3^k\) (3 because quality can go up, down or stay the same)
Huge computational difference!

Games in Continuous Time

Doraszelski and Judd (2012): what’s the advantage of continuous time?

Probability that two firms take a decision simultaneously is zero

With continuous time, the value function becomes \[ V^{\boldsymbol P_{-j}}_{j} (\mathbf{s}_{t}) = \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Bigg\lbrace \frac{1}{\lambda(a_{jt}) - \log(\beta)} \Bigg( \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + \lambda(a_{jt}) \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}\right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg) \Bigg\rbrace \]

\(\lambda(a_{jt}) = \delta + \frac{\alpha a_{jt}}{1 + \alpha a_{jt}}\) is the hazard rate for firm \(j\) that something happens
- i.e. either an increase in quality, with probability \(\frac{\alpha a_{jt}}{1 + \alpha a_{jt}}\)
- … or a decrease in quality with probability \(\delta\)

Computational gain

Now the expectation over future states \(\mathbb E_{\boldsymbol s_{t+1}}\) is over \(2J\) points instead of \(3^J\)
- 3 because quality can go up, down or stay the same
- 2 because in continuous time we don’t care if the state does not change (investment fails)

Comparison

Which method is best?

I compare them in Courthoud (2020)

Fastest: Weintraub, Benkard, and Van Roy (2008)
- Effectively transforms the game into single-agent dynamics
Best trade-off: Doraszelski and Judd (2019)
- Simple, practical and also helps in terms of equilibrium multeplicity
Also in Courthoud (2020): games with random order
- Better approximation than Doraszelski and Judd (2019)
- And similar similar time

Applications

Some applications of these methods include

Approximation methods
- Sweeting (2013): product repositioning among radio stations
- Barwick and Pathak (2015): entry and exit in the real estate brokerage industry
Oblivious equilibrium
- Xu and Chen (2020): R&D investment in the Korean electric motor industry
Moment based equilibrium
- Jeon (2020): demand learning in the container shipping industry
- Caoui (2019): technology adoption with network effects in the movie industry
- Vreugdenhil (2020): search and matching in the oil drilling industry
Games in continuous time
- Arcidiacono et al. (2016): entry, exit and scale decisions in retail competition
Games with random moves
- Igami (2017): innovation, entry, exit in the hard drive industry

From IO to AI

Bridging two Literatures

There is one method to approximate the equilibrium in dynamic games that is a bit different from the others: Pakes and McGuire (2001)

Idea: approximate the value function by Monte-Carlo simulation
Firms start with a guess for the alternative-specific value function
Act according to it
Observe realized payoffs and state transitions
And update the alternative-specific value function according to the realized outcomes

Experience-Based Equilibrium

Defined in Fershtman and Pakes (2012)
Def: policy is optimal given beliefs of state transitions and observed transitions are consistent with the beliefs
Note: definition silent on off-equilibrium path beliefs

Pakes and McGuire (2001)

Players start with alternative-specific value function
- yes, the ASV from Rust (1994)
- \(\bar V_{j,a}^{(0)} (\boldsymbol s ; \theta)\): initial value of player \(j\) for action \(a\) in state \(\boldsymbol s\)
Until convergence, do:
- Compute optimal action, given \(\bar V_{j, a}^{(t)} (\boldsymbol s ; \theta)\) \[ a^* = \arg \max_a \bar V_{j, a}^{(t)} (\boldsymbol s ; \theta) \]
- Observe the realized payoff \(\pi_{j, a^*}(\boldsymbol s ; \theta)\) and the realized next state \(\boldsymbol {s'}(\boldsymbol s, a^*; \theta)\)
- Update the alternative-specific value function of the chosen action \(k^*\) \[ \bar V_{j, a^*}^{(t+1)} (\boldsymbol s ; \theta) = (1-\alpha_{\boldsymbol s, t}) \bar V_{j, a^*}^{(t)} (\boldsymbol s ; \theta) + \alpha_{\boldsymbol s, t} \Big[\pi_{j, a^*}(\boldsymbol s ; \theta) + \arg \max_a \bar V_{j, a}^{(t)} (\boldsymbol s' ; \theta) \Big] \] where
  - \(\alpha_{\boldsymbol s, t} = \frac{1}{\text{number of times state } \boldsymbol s \text{ has been visited}}\)

Comments

Where is the strategic interaction?

Firm always take “best action so far” in each state
- Start to take a new action only when the previous best has performed badly for many periods
Remindful of literature of evolutionary game theory

Importance of starting values

Imagine, all payoffs are positive but value initialized to zero
First action in each state \(\to\) only action ever taken in that state
Loophole.
- Why? Firms always take \(\arg \max_a \bar V_a\) and never explore the alternatives

Convergence by desing

As \(\lim_{t \to \infty} \alpha_{\boldsymbol s, t} = 1\)
Firms stop updating the value by design

Q-Learning

Computer Science reinforcement learning literature (AI): Q-learning

Differences

\(\bar V_a( \boldsymbol s)\) called \(Q_a(\boldsymbol s)\), hence the name
Firms don’t always take the optimal action
- At the beginning of the algorithm: exploration
  - Firms take actions at random
  - Just to explore what happens taking different actions
- Gradually shift towards exploitation
  - I.e. take the optimal action, given \(\bar V^{(t)}( \boldsymbol s)\) at iteration \(t\)
  - I.e. shift towards Pakes and McGuire (2001)

Applications

Doraszelski, Lewis, and Pakes (2018)
- Firm do actually learn by trial and error
- Setting: demand learning in the UK frequency response market (electricity)
Asker et al. (2020)
- Uses Pakes and McGuire (2001) for estimation
- Setting: dynamic timber auctions with information sharing
Calvano et al. (2020)
- Study Q-learning pricing algorithms
- In repeated price competition with differentiated products
- (Computational) lab experiment: what do these algorithms converge to?
- Finding: algorithms learn reward-punishment collusive strategies

Appendix

References

Abbring, Jaap H, and Jeffrey R Campbell. 2010. “Last-in First-Out Oligopoly Dynamics.” Econometrica 78 (5): 1491–1527.

Aguirregabiria, Victor, Allan Collard-Wexler, and Stephen P Ryan. 2021. “Dynamic Games in Empirical Industrial Organization.” National Bureau of Economic Research.

Aguirregabiria, Victor, and Pedro Mira. 2007. “Sequential Estimation of Dynamic Discrete Games.” Econometrica 75 (1): 1–53.

Arcidiacono, Peter, Patrick Bayer, Jason R Blevins, and Paul B Ellickson. 2016. “Estimation of Dynamic Discrete Choice Models in Continuous Time with an Application to Retail Competition.” The Review of Economic Studies 83 (3): 889–931.

Arcidiacono, Peter, and Robert A Miller. 2011. “Conditional Choice Probability Estimation of Dynamic Discrete Choice Models with Unobserved Heterogeneity.” Econometrica 79 (6): 1823–67.

Asker, John, Chaim Fershtman, Jihye Jeon, and Ariel Pakes. 2020. “A Computational Framework for Analyzing Dynamic Auctions: The Market Impact of Information Sharing.” The RAND Journal of Economics 51 (3): 805–39.

Bajari, Patrick, C Lanier Benkard, and Jonathan Levin. 2007. “Estimating Dynamic Models of Imperfect Competition.” Econometrica 75 (5): 1331–70.

Barwick, Panle Jia, and Parag A Pathak. 2015. “The Costs of Free Entry: An Empirical Study of Real Estate Agents in Greater Boston.” The RAND Journal of Economics 46 (1): 103–45.

Berry, Steven T, and Giovanni Compiani. 2021. “Empirical Models of Industry Dynamics with Endogenous Market Structure.” Annual Review of Economics 13.

Besanko, David, Ulrich Doraszelski, Yaroslav Kryukov, and Mark Satterthwaite. 2010. “Learning-by-Doing, Organizational Forgetting, and Industry Dynamics.” Econometrica 78 (2): 453–508.

Borkovsky, Ron N, Ulrich Doraszelski, and Yaroslav Kryukov. 2010. “A User’s Guide to Solving Dynamic Stochastic Games Using the Homotopy Method.” Operations Research 58 (4-part-2): 1116–32.

Calvano, Emilio, Giacomo Calzolari, Vincenzo Denicolo, and Sergio Pastorello. 2020. “Artificial Intelligence, Algorithmic Pricing, and Collusion.” American Economic Review 110 (10): 3267–97.

Caoui, El Hadi. 2019. “Estimating the Costs of Standardization: Evidence from the Movie Industry.” R&R, Review of Economic Studies.

Collard-Wexler, Allan. 2013. “Demand Fluctuations in the Ready-Mix Concrete Industry.” Econometrica 81 (3): 1003–37.

Courthoud, Matteo. 2020. “Approximation Methods for Large Dynamic Stochastic Games.” Working Paper.

Doraszelski, Ulrich. 2003. “An r&d Race with Knowledge Accumulation.” Rand Journal of Economics, 20–42.

Doraszelski, Ulrich, and Kenneth L Judd. 2012. “Avoiding the Curse of Dimensionality in Dynamic Stochastic Games.” Quantitative Economics 3 (1): 53–93.

———. 2019. “Dynamic Stochastic Games with Random Moves.” Quantitative Marketing and Economics 17 (1): 59–79.

Doraszelski, Ulrich, Gregory Lewis, and Ariel Pakes. 2018. “Just Starting Out: Learning and Equilibrium in a New Market.” American Economic Review 108 (3): 565–615.

Doraszelski, Ulrich, and Mark Satterthwaite. 2010. “Computable Markov-Perfect Industry Dynamics.” The RAND Journal of Economics 41 (2): 215–43.

Egesdal, Michael, Zhenyu Lai, and Che-Lin Su. 2015. “Estimating Dynamic Discrete-Choice Games of Incomplete Information.” Quantitative Economics 6 (3): 567–97.

Eibelshäuser, Steffen, and David Poensgen. 2019. “Markov Quantal Response Equilibrium and a Homotopy Method for Computing and Selecting Markov Perfect Equilibria of Dynamic Stochastic Games.” Working Paper.

Ericson, Richard, and Ariel Pakes. 1995. “Markov-Perfect Industry Dynamics: A Framework for Empirical Work.” The Review of Economic Studies 62 (1): 53–82.

Esteban, Susanna, and Matthew Shum. 2007. “Durable-Goods Oligopoly with Secondary Markets: The Case of Automobiles.” The RAND Journal of Economics 38 (2): 332–54.

Farias, Vivek, Denis Saure, and Gabriel Y Weintraub. 2012. “An Approximate Dynamic Programming Approach to Solving Dynamic Oligopoly Models.” The RAND Journal of Economics 43 (2): 253–82.

Fershtman, Chaim, and Ariel Pakes. 2012. “Dynamic Games with Asymmetric Information: A Framework for Empirical Work.” The Quarterly Journal of Economics 127 (4): 1611–61.

Goettler, Ronald L, and Brett R Gordon. 2011. “Does AMD Spur Intel to Innovate More?” Journal of Political Economy 119 (6): 1141–1200.

Hotz, V Joseph, and Robert A Miller. 1993. “Conditional Choice Probabilities and the Estimation of Dynamic Models.” The Review of Economic Studies 60 (3): 497–529.

Huang, Ling, and Martin D Smith. 2014. “The Dynamic Efficiency Costs of Common-Pool Resource Exploitation.” American Economic Review 104 (12): 4071–4103.

Ifrach, Bar, and Gabriel Y Weintraub. 2017. “A Framework for Dynamic Oligopoly in Concentrated Industries.” The Review of Economic Studies 84 (3): 1106–50.

Igami, Mitsuru. 2017. “Estimating the Innovator’s Dilemma: Structural Analysis of Creative Destruction in the Hard Disk Drive Industry, 1981–1998.” Journal of Political Economy 125 (3): 798–847.

Iskhakov, Fedor, John Rust, and Bertel Schjerning. 2016. “Recursive Lexicographical Search: Finding All Markov Perfect Equilibria of Finite State Directional Dynamic Games.” The Review of Economic Studies 83 (2): 658–703.

Jeon, Jihye. 2020. “Learning and Investment Under Demand Uncertainty in Container Shipping.” The RAND Journal of Economics.

Kasahara, Hiroyuki, and Katsumi Shimotsu. 2009. “Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices.” Econometrica 77 (1): 135–75.

Maskin, Eric, and Jean Tirole. 1988. “A Theory of Dynamic Oligopoly, II: Price Competition, Kinked Demand Curves, and Edgeworth Cycles.” Econometrica: Journal of the Econometric Society, 571–99.

Pakes, Ariel, and Paul McGuire. 1994. “Computing Markov-Perfect Nash Equilibria: Numerical Implications of a Dynamic Differentiated Product Model.” RAND Journal of Economics 25 (4): 555–89.

———. 2001. “Stochastic Algorithms, Symmetric Markov Perfect Equilibrium, and the ‘Curse’of Dimensionality.” Econometrica 69 (5): 1261–81.

Pakes, Ariel, Michael Ostrovsky, and Steven Berry. 2007. “Simple Estimators for the Parameters of Discrete Dynamic Games (with Entry/Exit Examples).” The RAND Journal of Economics 38 (2): 373–99.

Pesendorfer, Martin, and Philipp Schmidt-Dengler. 2008. “Asymptotic Least Squares Estimators for Dynamic Games.” The Review of Economic Studies 75 (3): 901–28.

———. 2010. “Sequential Estimation of Dynamic Discrete Games: A Comment.” Econometrica 78 (2): 833–42.

Rust, John. 1994. “Structural Estimation of Markov Decision Processes.” Handbook of Econometrics 4: 3081–3143.

Ryan, Stephen P. 2012. “The Costs of Environmental Regulation in a Concentrated Industry.” Econometrica 80 (3): 1019–61.

Su, Che-Lin, and Kenneth L Judd. 2012. “Constrained Optimization Approaches to Estimation of Structural Models.” Econometrica 80 (5): 2213–30.

Sweeting, Andrew. 2013. “Dynamic Product Positioning in Differentiated Product Markets: The Effect of Fees for Musical Performance Rights on the Commercial Radio Industry.” Econometrica 81 (5): 1763–803.

Vreugdenhil, Nicholas. 2020. “Booms, Busts, and Mismatch in Capital Markets: Evidence from the Offshore Oil and Gas Industry.” R&R at Journal of Political Economy.

Weintraub, Gabriel Y, C Lanier Benkard, and Benjamin Van Roy. 2008. “Markov Perfect Industry Dynamics with Many Firms.” Econometrica 76 (6): 1375–1411.

Xu, Daniel Yi, and Yanyou Chen. 2020. “A Structural Empirical Model of r&d, Firm Heterogeneity, and Industry Evolution.” Journal of Industrial Economics.