Dynamic Games
Last updated on Oct 29, 2021
Introduction
Intro
Setting: agents making strategic decisions (new) in dynamic environments.
- Entry and exit: Collard-Wexler (2013)
- Sunk costs: Ryan (2012)
- Innovation: Goettler and Gordon (2011)
- (or whatever changes in response to investment)
- Exploitation of natural resources: Huang and Smith (2014)
- Durable goods: Esteban and Shum (2007)
Lit review: forthcoming IO Handbook chapter Aguirregabiria, Collard-Wexler, and Ryan (2021)
Single- vs Multi-Agent
Typically in IO we study agents in strategic environments. Complicated in dynamic environments.
- Curse of dimensionality
- Single agent: need to track what the agent sees ($k$ states)
- Multi-agent: need to keep track what every agent sees ($k^J$states)
- Difference exponential in the number of agents
- Expectations
- Need not only to keep track of how the environment evolves
- … but also of how other players act
- Equilibrium
- Because of the strategic interaction, the Bellman equation is
not a contraction anymore
- Equilibrium existence?
- Equilibrium uniqueness?
- Because of the strategic interaction, the Bellman equation is
not a contraction anymore
Plan
We will cover first the estimation and then the computation of dynamic games
- Weird…
- Standard estimation method: Bajari, Benkard, and Levin (2007)
- Does not require to solve the model
- Indeed, that’s the advantage of the method
- Disadvantages: still need to solve the model for counterfactuals
- So we’ll cover computation afterwards
Last: bridge between Structural IO and Artificial Intelligence
- Different objectives but similar methods
- Dynamic tools niche in IO but at the core of AI
Bajari, Benkard, Levin (2008)
Model
Stylized version of Ericson and Pakes (1995) (no entry/exit)
-
$J$ firms (products) indexed by $j \in \lbrace 1, …, J \rbrace$
-
Time $t$ is dicrete, horizon is infinite
-
States $s_{jt} \in \lbrace 1, … \bar s \rbrace$: quality of product $j$ in period $t$
-
Actions $a_{jt} \in \mathbb R^+$: investment decision of firm $j$ in period $t$
-
Static payoffs $$ \pi_j (s_{jt}, \boldsymbol s_{-jt}, a_{jt}; \theta^\pi) $$ where
- $\boldsymbol s_{-it}$: state vector of all other firms in period $t$
- $\theta^\pi$: parameters that govern static profits
Note: if we micro-fund $\pi(\cdot)$ , e.g. with some demand and supply model, we have 2 strategic decisions: prices (static) and investment (dynamic).
Model (2)
-
State transitions $$ \boldsymbol s_{t+1} = f(\boldsymbol s_t, \boldsymbol a_t, \boldsymbol \epsilon_t; \theta^f) $$ where
- $\boldsymbol a_t$: vector of actions of all firm
- $\boldsymbol \epsilon_t$: vector of idiosyncratic shocks
- $\theta^f$: parameters that govern state transitions
-
Objective function: firms maximize expected discounted future profits $$ \max_{\boldsymbol a} \ \mathbb E_t \left[ \sum_{\tau=0}^\infty \beta^{\tau} \pi_{j, t+\tau} (\theta^\pi) \right] $$
Value Function
The value function of firm $j$ at time $t$ in state $\boldsymbol s_{t}$, under a set of strategy functions $\boldsymbol P$ (one for each firm) is $$ V^{\boldsymbol P_{-j}}{j} (\mathbf{s}{t}) = \max_{a_{jt} \in \mathcal{A}j \left(\mathbf{s}{t}\right)} \Bigg\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}{t} ; \theta^\pi ) + \beta \mathbb E{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}{t+1}\right) \ \Big| \ a{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace $$ where
-
$\pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}{t} ; \theta^\pi )$ are the static profits of firm $j$ given action $a{jt}$ and policy functions $\boldsymbol P_{-j}$ for all firms a part from $j$
-
The expecation $\mathbb E$ is taken with respect to the conditional transition probabilities $f^{\boldsymbol P_{-j}} (\mathbf{s}{t+1} | \mathbf{s}{t}, a_{jt} ; \theta^f)$
Equilibrium
Equillibrium notion: Markow Perfect Equilibrium (Maskin and Tirole 1988)
- Assumption: players’ strategies at period $t$ are functions only of payoff-relevant state variables at the same period
- Definition: a set of $J$ value and policy functions,
$\boldsymbol V$ and $\boldsymbol P$ such that each firm
- maximizes its value function $V_j$
- given the policy function of every other firm $\boldsymbol P_{-j}$
What is it basically?
- Nash Equilibrium in the policy functions
- What are we ruling out?
- Strategies that depend on longer histories
- E.g. “has anyone ever cheated in a cartel?”
Estimation
We want to estimate 2 sets of parameters:
- $\theta^\pi$: parameterizes period profit function $\pi(\cdot)$
- $\theta^f$: parameterizes state transition function $f(\cdot)$
Generally 2 approaches
- Full solution
- Impractical (we’ll see more details later)
- Rely on some sort of Hotz and Miller (1993) CCP inversion
BBL Overview
Bajari, Benkard, and Levin (2007) plan
- Estimate transition probabilities and conditional choice probabilities from the data
- Use them to simulate the expected value function, given a set of parameters
- Use optimality of estimated choices to pin down static profit
parameters
- I.e. repeat (2) for alternative strategies
- By definition suboptimal
- Estimating equation: values implied by observed strategies should be higher than values implied by alternative strategies
- I.e. repeat (2) for alternative strategies
BBL: First Stage
- Estimate the transition probabilities
$f ( \cdot | a_{jt}, \boldsymbol s_t; \hat \theta^f )$
- I.e. what is the observed frequency of any state-to-state transition?
- For any given action of firm $j$
- … and conditional choice probabilities
$\hat P_j(\cdot | \boldsymbol s_t)$
- I.e. what is the probability of each action, for each firm $j$ in each state $\boldsymbol s$
- Can be done non-parametrically
- i.e. just observe frequencies
- Conditional on having enough data
- Note: need to estimate transitions, conditional on each state and action
- Problem with many states and actions, but especially with many
players
- Curse of dimensionality
- Number of states increases exponentially in number of players
Important: parametric assumptions would contradict the model for the estimation of value/policy functions
BBL: Second Stage
First step: from transitions $f(\hat \theta^f)$ and CCPs $\boldsymbol{\hat P}$ to values
-
We can use transitions and CCPs to simulate histories (of length $\tilde T$)
- of states $\lbrace \boldsymbol{\tilde{s}{\tau}} \rbrace{\tau = 1}^{\tilde T}$
- and actions $\lbrace \boldsymbol{\tilde{a}{\tau}} \rbrace{\tau = 1}^{\tilde T}$
-
Given a parameter value $\tilde \theta^\pi$, we can compute static payoffs: $\pi_{j}^{\boldsymbol {\hat{P}{-j}}} \left( \tilde a{j\tau}, \boldsymbol{\tilde s}_{\tau} ; \tilde \theta^\pi \right)$
-
Simulated history + static payoffs = simulated value function $$ {V}{j}^{\boldsymbol {\hat{P}}} \left(\boldsymbol{s}{t} ; \tilde \theta^\pi \right) = \sum_{\tau=0}^{\tilde T} \beta^{\tau} \pi_{j}^{\boldsymbol {\hat{P}{-j}}} \left( \tilde a{j\tau}, \boldsymbol{\tilde s}_{\tau} ; \tilde \theta^\pi \right) $$
-
We can average over many, e.g. $R$, simulated value functions to get an expected value function $$ {V}{j}^{\boldsymbol {\hat{P}}, R} \left( \boldsymbol{s}{t} ; \tilde \theta^\pi \right) = \frac{1}{R} \sum_{r=0}^{R}\Bigg( \sum_{\tau=0}^{\tilde T} \beta^{\tau} \pi_{j}^{\boldsymbol {\hat{P}{-j}}} \left(\tilde a^{(r)}{j\tau}, \boldsymbol{\tilde s}^{(r)}_{\tau} ; \tilde \theta^\pi \right) \Bigg) $$
In practice, for a parameter value $\tilde \theta^\pi$
For $r = 1, …, R$ simulations do:
- Initialize firms value to zero
- Fot $\tau=0, …, \tilde T$ do
- For each state in $\boldsymbol{\tilde s}^{(r)}_{\tau}$ do:
- Use $\boldsymbol{\hat P}$ to draw a vector of firm actions $\boldsymbol{\tilde a}^{(r)}_{\tau}$
- For each firm $j = 1, …, J$ do:
- Compute static profits $\pi_{j}^{\boldsymbol {\hat{P}{-j}}} \left(\tilde a^{(r)}{j\tau}, \boldsymbol{\tilde s}^{(r)}_{\tau} ; \tilde \theta^\pi \right)$
- Add discounted profits $\beta^{\tau} \pi_{j}^{\boldsymbol {\hat{P}{-j}}} \left(\tilde a^{(r)}{j\tau}, \boldsymbol{\tilde s}^{(r)}_{\tau} ; \tilde \theta^\pi \right)$ to the value function
- Use $f ( \cdot | \boldsymbol {a_{t}}, \boldsymbol s_t; \hat \theta^f )$ to draw the next state $\boldsymbol{\tilde s}^{(r)}_{\tau + 1}$
- Use the next state, $\boldsymbol{\tilde s}^{(r)}_{\tau + 1}$ as current state for the next iteration
- For each state in $\boldsymbol{\tilde s}^{(r)}_{\tau}$ do:
Then average all the value functions together to obtain an expected value function $V_{j}^{\boldsymbol {\hat{P}}, R} \left(\boldsymbol{s}_{t} ; \tilde \theta^\pi \right)$
Note: advantage of simulations: can be parallelized
Objective Function
What have we done so far?
- Given some parameters $\theta^\pi$, we computed the expected value function
How do we pick the $\theta^\pi$ that best rationalizes the data?
- I.e. what is the objective function?
- Potentially many options
BBL idea
- the expected value function has to be optimal, given the CCPs
- I.e. any other policy function should give a lower expected value
- “Best” $\theta^\pi$: those for which the implied expected value function under the estimated CCPs is greater than the one implied by any other CCP
Note: it’s an inequality statement
Objective Function (2)
Idea
-
If the observed policy ${\color{green}{\boldsymbol{\hat P}}}$ is optimal,
-
All other policies ${\color{red}{\boldsymbol{\tilde P}}}$
-
… at the true parameters $\theta^f$
-
… should give a lower expected value $$ V_{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}{t} ; \tilde \theta^\pi \right) \leq V{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) $$
-
-
So which are the true parameters?
-
Those for which any deviation from the observed policy ${\color{green}{\boldsymbol{\hat P}}}$ yields a lower value
-
Objective function to minimize: violations under alternative policies ${\color{red}{\boldsymbol{\tilde P}}}$ $$ \min_{\tilde \theta^\pi} \sum_{\boldsymbol s_{t}} \sum_{{\color{red}{\boldsymbol{\tilde P}}}} \Bigg[\min \bigg\lbrace V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}{t} ; \tilde \theta^\pi \right) - V{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \ , \ 0 \bigg\rbrace \Bigg]^{2} $$
-
Estimator
Estimator: $\theta^\pi$ that minimizes the average (squared) magnitude of violations for any alternative policy ${\color{red}{\boldsymbol{\tilde P}}}$ $$ \hat{\theta}^\pi= \arg \min_{\tilde \theta^\pi} \sum_{\boldsymbol s_{t}} \sum_{{\color{red}{\boldsymbol{\tilde P}}}} \Bigg[\min \bigg\lbrace V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}{t} ; \tilde \theta^\pi \right) - V{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \ , \ 0 \bigg\rbrace \Bigg]^{2} $$
- $\min \Big\lbrace V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} - V_j^{{\color{red}{\boldsymbol{\tilde P}}}, R} \ , \ 0 \Big\rbrace$
to pick only the violations
- If ${\color{green}{\boldsymbol{\hat P}}}$ implies higher value, we can ignore
- Doesn’t matter by how much you respect the inequality
- Which alternative policies
${\color{red}{\boldsymbol{\tilde P}}}$ should we use?
- In principle, any perturbation is ok
- But in practice, if we perturbe it too much, we can go too far off
- Tip 1: start with very small perturbations
- Tip 2: use perturbation that sensibly affect the dynamics
- E.g. exiting in a state in which a firm is not a competitive threat
- Tip 3: use perturbations on dimensions that are relevant
for the research question
- E.g. they affect dimensions where you want to make counterfactual predictions
Advantages
We have seen that there are competing methods.
What are the advantages of Bajari, Benkard, and Levin (2007) over those?
- Continuous actions
- BBL does not require actions to be discretised
- You can just sample actions from the data!
- Choice of alternative CCPs
- The researcher is free to choose the alternative CCPs ${\color{red}{\boldsymbol{\tilde P}}}$
- Pros: can make source of variation more transparent
- allows the researcher to focus on those predictions of the model that are key for the specific research questions
- Cons: it’s a very high dimensional space
- There are very very many alternative policy functions
Problems
- Computational curse of dimensionality is gone (in the state
space)
- But we have a curse of dimensionality in data
- Need a lot of markets because now 1 market is 1 observation
- Multiple equilibria??
- We are basically assuming it away
- Estimating the CCPs in the first stage we assume that is the equilibrium that is played in all markets at all times
- To run counterfactuals, we still need to solve the model
- Unobserved heterogeneity
- Kasahara and Shimotsu (2009): how to identify the (minimum) number of unobserved types
- Arcidiacono and Miller (2011): how to use an EM algorithm for the 1st stage estimation with unobserved types, conditional on the number of types
- Berry and Compiani (2021): instrumental variables approach, relying on observed states in the distant past
- Non-stationarity
- If we have a long time period, something fundamentally might have changed
Ericson Pakes (1995)
Introduction
Ericson and Pakes (1995) and companion paper Pakes and McGuire (1994) for the computation
-
$J$ firms indexed by $j \in \lbrace 1, …, J \rbrace$
-
Time $t$ is dicrete $t$, horizon is infinite
-
State $s_{jt}$: quality of firm $j$ in period $t$
-
Per period profits $$ \pi (s_{jt}, \boldsymbol s_{-jt}, ; \theta^\pi) $$ where
- $\boldsymbol s_{-it}$: state vector of all other firms in period $t$
- $\theta^\pi$: parameters that govern static profits
-
We can micro-fund profits with some demand and supply functions
- There can be some underlying static strategic interaction
- E.g. logit demand and bertrand competition in prices $p_{it}$
State Transitions
Investment: firms can invest an dollar amount $x$ to increase their future quality
-
Continuous decision variable ($\neq$ Rust)
-
Probability that investment is successful $$ \Pr \big(i_{jt} \ \big| \ a_{it} = x \big) = \frac{\alpha x}{1 + \alpha x} $$
-
Higher investment, higher success probability
-
$\alpha$ parametrizes the returns on investment
Quality depreciation
- With probability $\delta$, quality decreases by one level
Law of motion $$ s_{j,t+1} = s_{jt} + i_{jt} - \delta $$
Decision Variables
Note that in Ericson and Pakes (1995) we have two separate decision variables
- Static decision variable: price $p_{jt}$
- Dynamic decision variable: investment $i_{jt}$
Does not have to be the case!
Example: Besanko et al. (2010)
- Model of learning-by-doing: firms decrease their marginal cost through sales
- State variable: firm stock of know how $e$
- The higher the stock of know-how, the lower the marginal cost
- Increases when a firm manages to make a sale
- $q \in [0,1]$ now is both static quantity and transition probability
- Single decision variable: price $p$
- Usual static effects on profits $\pi_{jt} = (p_{jt} - c(e_{jt})) \cdot q_j(\boldsymbol p_t)$
- But also dynamic effect through transition probabilities
- Probability of increasing $e_t$: $q_j(\boldsymbol p_t)$
Equilibrium
Firms maximize the expected flow of discounted profits $$ \max_{\boldsymbol a} \ \mathbb E_t \left[ \sum_{\tau=0}^\infty \beta^{\tau} \pi_{j, t+\tau} (\theta^\pi) \right] $$ Markow Perfect Equilibrium
Equillibrium notion: Markow Perfect Equilibrium (Maskin and Tirole 1988)
- A set of $J$ value and policy functions, $\boldsymbol V$ and
$\boldsymbol P$ such that each firm
- maximizes its value function $V_j$
- given the policy function of every other firm $\boldsymbol P_{-j}$
Exit
One important extension is exit.
- In each time period, incuments decide whether to stay
- … or exit and get a scrap value $\phi^{exit}$
The Belman Equation of incumbent $j$ at time $t$ is $$ V^{\boldsymbol P_{-j}}{j} (\mathbf{s}{t}) = \max_{d^{exit}{jt} \in \lbrace 0, 1 \rbrace} \Bigg\lbrace \begin{array}{c} \beta \phi^{exit} \ , \newline \max{a_{jt} \in \mathcal{A}j \left(\mathbf{s}{t}\right)} \Big\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}{t} ; \theta^\pi ) + \beta \mathbb E{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}{t+1}\right) \ \Big| \ a{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Big\rbrace \end{array} \Bigg\rbrace $$ where
- $\phi^{exit}$: exit scrap value
- $d^{exit}_{jt} \in \lbrace 0,1 \rbrace$: exit decision
Entry
We can also incorporate endogenous entry.
- One or more potential entrants exist outside the market
- They can pay an entry cost $\phi^{entry}$ and enter the market at a quality state $\bar s$
- … or remain outside at no cost
Value function $$ V_{j}^{\boldsymbol P_{-j}} (e, \boldsymbol x_{-jt} ; \theta) = \max_{d^{entry} \in \lbrace 0,1 \rbrace } \Bigg\lbrace \begin{array}{c} 0 \ ; \newline
-
\phi^{entry} + \beta \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} (\bar s, \boldsymbol s_{-j, t+1} ; \theta) \ \Big| \ \boldsymbol s_{t} ; \theta^f \Big] \end{array} \Bigg\rbrace $$ where
-
$d^{entry} \in \lbrace 0,1 \rbrace$: entry decision
-
$\phi^{entry}$: entry cost
-
$\bar s$: state in which entrants enters (could be random)
Do we observe potential entrants?
- Igami (2017): tech industry announce their entry
- Critique: not really potential entrants, they are half-way inside
Equilibrium Existence
Doraszelski and Satterthwaite (2010): a MPE might not exist in Ericson and Pakes (1995) model.
Solution
- Replace fixed entry costs $\phi^{entry}$ and exit scrap values $\phi^{exit}$ with random ones
- It becomes a game of incomplete information
- First explored in Rust (1994)
- New equilibrium concept
Markov Perfect Bayesian Nash Equilibrium (MPBNE)
- Basically the same, with rational beliefs on random variables
Solving the Model
Solving the model is very similar to Rust
- Given parameter values $\theta$
- Start with a guess for the value and policy functions
- Until convergence, do:
- For each firm $j = 1, …, J$, do:
- Take the policy functions of all other firms
- Compute the implied transition probabilities
- Use them to compute the new policy function for firm $j$
- Compute the implied value function
- For each firm $j = 1, …, J$, do:
Where do things get complicated / tricky? Policy function update
Policy Update Example: exit game
Imagine a stylized exit game with 2 firms
- Easy to get an update rule of the form: “exit if opponent stays, stay if opponent exits”
Computationally
- Initialize policy functions to $(exit, exit)$
- Iteration 1:
- Each firm takes opponent policy as given: $exit$
- Update own optimal policy: $stay$
- New policy: $(stay, stay)$
- Iteration 2: $(stay, stay) \to (exit, exit)$
- Iteration 2: $(exit, exit) \to (stay, stay)$
- Etc…
Issues: value function iteration might not converge and equilibrium multeplicity.
Convergence Tips
- Try different starting values
- Often it’s what makes the biggest difference
- Ideally, start as close as possible to true values
- Approximation methods can help (we’ll see more later)
- I.e. get a fast approximation to use as starting vlaue for solution algorithm
- Partial/stochastic value function update rule
- Instead of $V’ = T(V)$, use $V’ = \alpha T(V) + (1-\alpha)V$
- Very good to break loops, especially if $\alpha$ is stochastic, e.g. $\alpha \sim U(0,1)$
- How large is the support of the entry/exit costs?
- If support is too small, you end up back in the entry/exit loop
- Try alternative non-parallel updating schemes
- E.g. update value one state at the time (in random order?)
- Last but not least: change the model
- In particular, from simultaneous to alternating moves
- or continuous time
Multiple Equilibria
How to find them?
- Besanko et al. (2010) and Borkovsky,
Doraszelski, and Kryukov (2010):
homotopy method
- can find some equilibria, but not all
- complicated to implement: need to compute first order conditions $H(\boldsymbol V, \theta) = 0$ and their Jacobian $\Delta H(\boldsymbol V, \theta)$
- Idea: trace the equilibrium correspondence $H^{-1} = \lbrace (\boldsymbol V, \theta) : H(\boldsymbol V, \theta) = 0 \rbrace$ in the value-parameter space
- Eibelshäuser and Poensgen (2019)
- Markov Quantal Response Equilibrium
- approact dynamic games from a evolutionary game theory
perspective
- actions played at random and those bringing highest payoffs survive
- $\to$ homothopy method guaranteed to find one equilibrium
- Pesendorfer and Schmidt-Dengler
(2010): some equilibria are not
Lyapunov-stable
- BR iteration cannot find them unless you start exactly at the solution
- Su and Judd (2012) and Egesdal, Lai, and
Su (2015): same point, but numerically
- using MPEC approach
Multiple Equilibria (2)
Can we assume them away?
- Igami (2017)
- Finite horizon
- Homogenous firms (in profit functions and state transitions)
- One dynamic move per period (overall, not per-firm)
- Abbring and Campbell (2010)
- Entry/exit game
- Homogeneous firms
- Entry and exit decisions are follow a last-in first-out (LIFO)
structure
- “An entrant expects to produce no longer than any incumbent”
- Iskhakov, Rust, and Schjerning (2016)
- can find all equilibria, but for very specific class of dynamic games
- must always proceed “forward”
- e.g. either entry or exit but not both
- Idea: can solve by backward induction even if horizon is infinite
Curse of Dimensionality
What are the computational bottlenecks? $$ V^{\boldsymbol P_{-j}}{j} ({\color{red}{\mathbf{s}{t}}}) = \max_{a_{jt} \in \mathcal{A}j \left(\mathbf{s}{t}\right)} \Bigg\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}{t} ; \theta^\pi ) + \beta \mathbb E{{\color{red}{\mathbf{s}{t+1}}}} \Big[ V{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}{t+1}\right) \ \Big| \ a{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace $$
- Dimension of the state space
- In single agent problems, we have as many states as many values of $s_{jt}$ ($k$)
- In dynamics games, the state space goes from $k$ to $k^J$
- symmetry helps: state $[1,2,3]$ and $[1,3,2]$ become the same for firm 1
- How much do we gain? From $k^J$ to $k \cdot {k + J - 2 \choose k - 1}$
- Dimension of the integrand
- If in single agent problems, we have to integrate over $\kappa$
outcomes,
- 4 in Rust: engine replaced (yes|no) $\times$ mileage increases (yes|no)
- … in dynamic games, we have to consider $\kappa^J$ outcomes
- If in single agent problems, we have to integrate over $\kappa$
outcomes,
Note: bottlenecks are not addittive but multiplicative: have to solve the expectation for each point in the state space. Improving on any of the two helps a lot.
Curse of Dimensionality (2)
Two and a half classes of solutions:
- Computational: approximate the equilibrium
- Conceptual: define another game
- Kind of both: Pakes and McGuire (2001)
- experience-based equilibrium (Fershtman and Pakes 2012)
Note: useful also to get good starting values for a full solution method!
Oblivious Equilibrium
Weintraub, Benkard, and Van Roy (2008): what if firms had no idea about the state of other firms?
- or atomistic firms
The value function becomes $$ V_{j} ({\color{red}{s_{t}}}) = \max_{a_{jt} \in \mathcal{A}j \left({\color{red}{s{t}}}\right)} \Bigg\lbrace {\color{red}{\mathbb E_{\boldsymbol s_t}}} \Big[ \pi_{j} (a_{jt}, \mathbf{s}{t} ; \theta^\pi ) \Big| P \Big] + \beta \mathbb E{{\color{red}{s_{t+1}}}} \Big[ V_{j} \left({\color{red}{s_{t+1}}}\right) \ \Big| \ a_{jt}, {\color{red}{s_{t}}} ; \theta^f \Big] \Bigg\rbrace $$
- Now the state is just $s_t$ instead of $\boldsymbol s_t$
- Huge computational gain: from $k^J$ points to $k$
- Also the expectation of future states is taken over $3$ instead
of $3^J$ points
- (3 because quality can go up, down or stay the same)
- But need to compute static profits as the expected value given the
current policy function
- Need to keep track of the asymptotic state distribution as you iterate the value
Games with Random Moves
Doraszelski and Judd (2019): what if instead of simultaneously, firms would move one at the time at random?
- Important: to have the same frequency of play, a period now is $J$ times shorter
The value function becomes $$ V^{\boldsymbol P_{-j}}{j} (\mathbf{s}{t}, {\color{red}{n=j}}) = \max_{a_{jt} \in \mathcal{A}j \left(\mathbf{s}{t}\right)} \Bigg\lbrace {\color{red}{\frac{1}{J}}}\pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}{t} ; \theta^\pi ) + {\color{red}{\sqrt[J]{\beta}}} \mathbb E{{\color{red}{n, s_{j, t+1}}}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}{t+1}, {\color{red}{n}} \right) \ \Big| \ a{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace $$
- $n$: indicates whose turn is to play
- since a turn is $J$ times shorter, profits are $\frac{1}{J} \pi$ and discount factor is $\sqrt[J]{\beta}$
Computational gain
- Expectation now taken over $n, s_{j, t+1}$ instead of $\boldsymbol s_{t+1}$
- I.e. $Jk$ points instead of $3^k$ (3 because quality can go up, down or stay the same)
- Huge computational difference!
Games in Continuous Time
Doraszelski and Judd (2012): what’s the advantage of continuous time?
- Probability that two firms take a decision simultaneously is zero
With continuous time, the value function becomes $$ V^{\boldsymbol P_{-j}}{j} (\mathbf{s}{t}) = \max_{a_{jt} \in \mathcal{A}j \left(\mathbf{s}{t}\right)} \Bigg\lbrace \frac{1}{\lambda(a_{jt}) - \log(\beta)} \Bigg( \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}{t} ; \theta^\pi ) + \lambda(a{jt}) \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}{t+1}\right) \ \Big| \ a{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg) \Bigg\rbrace $$
- $\lambda(a_{jt}) = \delta + \frac{\alpha a_{jt}}{1 + \alpha a_{jt}}$
is the hazard rate for firm $j$ that something happens
- i.e. either an increase in quality, with probability $\frac{\alpha a_{jt}}{1 + \alpha a_{jt}}$
- … or a decrease in quality with probability $\delta$
Computational gain
- Now the expectation over future states
$\mathbb E_{\boldsymbol s_{t+1}}$ is over $2J$ points instead of
$3^J$
- 3 because quality can go up, down or stay the same
- 2 because in continuous time we don’t care if the state does not change (investment fails)
Comparison
Which method is best?
I compare them in Courthoud (2020)
- Fastest: Weintraub, Benkard, and Van Roy
(2008)
- Effectively transforms the game into single-agent dynamics
- Best trade-off: Doraszelski and Judd
(2019)
- Simple, practical and also helps in terms of equilibrium multeplicity
- Also in Courthoud (2020): games
with random order
- Better approximation than Doraszelski and Judd (2019)
- And similar similar time
Applications
Some applications of these methods include
- Approximation methods
- Oblivious equilibrium
- Xu and Chen (2020): R&D investment in the Korean electric motor industry
- Moment based equilibrium
- Games in continuous time
- Arcidiacono et al. (2016): entry, exit and scale decisions in retail competition
- Games with random moves
- Igami (2017): innovation, entry, exit in the hard drive industry
From IO to AI
Bridging two Literatures
There is one method to approximate the equilibrium in dynamic games that is a bit different from the others: Pakes and McGuire (2001)
- Idea: approximate the value function by Monte-Carlo simulation
- Firms start with a guess for the alternative-specific value function
- Act according to it
- Observe realized payoffs and state transitions
- And update the alternative-specific value function according to the realized outcomes
Experience-Based Equilibrium
- Defined in Fershtman and Pakes (2012)
- Def: policy is optimal given beliefs of state transitions and observed transitions are consistent with the beliefs
- Note: definition silent on off-equilibrium path beliefs
Pakes and McGuire (2001)
-
Players start with alternative-specific value function
- yes, the ASV from Rust (1994)
- $\bar V_{j,a}^{(0)} (\boldsymbol s ; \theta)$: initial value of player $j$ for action $a$ in state $\boldsymbol s$
-
Until convergence, do:
-
Compute optimal action, given $\bar V_{j, a}^{(t)} (\boldsymbol s ; \theta)$ $$ a^* = \arg \max_a \bar V_{j, a}^{(t)} (\boldsymbol s ; \theta) $$
-
Observe the realized payoff $\pi_{j, a^}(\boldsymbol s ; \theta)$ and the realized next state $\boldsymbol {s’}(\boldsymbol s, a^; \theta)$
-
Update the alternative-specific value function of the chosen action $k^$ $$ \bar V_{j, a^}^{(t+1)} (\boldsymbol s ; \theta) = (1-\alpha_{\boldsymbol s, t}) \bar V_{j, a^}^{(t)} (\boldsymbol s ; \theta) + \alpha_{\boldsymbol s, t} \Big[\pi_{j, a^}(\boldsymbol s ; \theta) + \arg \max_a \bar V_{j, a}^{(t)} (\boldsymbol s’ ; \theta) \Big] $$ where
- $\alpha_{\boldsymbol s, t} = \frac{1}{\text{number of times state } \boldsymbol s \text{ has been visited}}$
-
Comments
Where is the strategic interaction?
- Firm always take “best action so far” in each state
- Start to take a new action only when the previous best has performed badly for many periods
- Remindful of literature of evolutionary game theory
Importance of starting values
- Imagine, all payoffs are positive but value initialized to zero
- First action in each state $\to$ only action ever taken in that state
- Loophole.
- Why? Firms always take $\arg \max_a \bar V_a$ and never explore the alternatives
Convergence by desing
- As $\lim_{t \to \infty} \alpha_{\boldsymbol s, t} = 1$
- Firms stop updating the value by design
Q-Learning
Computer Science reinforcement learning literature (AI): Q-learning
Differences
- $\bar V_a( \boldsymbol s)$ called $Q_a(\boldsymbol s)$, hence the name
- Firms don’t always take the optimal action
- At the beginning of the algorithm: exploration
- Firms take actions at random
- Just to explore what happens taking different actions
- Gradually shift towards exploitation
- I.e. take the optimal action, given $\bar V^{(t)}( \boldsymbol s)$ at iteration $t$
- I.e. shift towards Pakes and McGuire (2001)
- At the beginning of the algorithm: exploration
Applications
- Doraszelski, Lewis, and Pakes (2018)
- Firm do actually learn by trial and error
- Setting: demand learning in the UK frequency response market (electricity)
- Asker et al. (2020)
- Uses Pakes and McGuire (2001) for estimation
- Setting: dynamic timber auctions with information sharing
- Calvano et al. (2020)
- Study Q-learning pricing algorithms
- In repeated price competition with differentiated products
- (Computational) lab experiment: what do these algorithms converge to?
- Finding: algorithms learn reward-punishment collusive strategies
Appendix
References [references]
Abbring, Jaap H, and Jeffrey R Campbell. 2010. “Last-in First-Out Oligopoly Dynamics.” Econometrica 78 (5): 1491–1527.
Aguirregabiria, Victor, Allan Collard-Wexler, and Stephen P Ryan. 2021. “Dynamic Games in Empirical Industrial Organization.” National Bureau of Economic Research.
Aguirregabiria, Victor, and Pedro Mira. 2007. “Sequential Estimation of Dynamic Discrete Games.” Econometrica 75 (1): 1–53.
Arcidiacono, Peter, Patrick Bayer, Jason R Blevins, and Paul B Ellickson. 2016. “Estimation of Dynamic Discrete Choice Models in Continuous Time with an Application to Retail Competition.” The Review of Economic Studies 83 (3): 889–931.
Arcidiacono, Peter, and Robert A Miller. 2011. “Conditional Choice Probability Estimation of Dynamic Discrete Choice Models with Unobserved Heterogeneity.” Econometrica 79 (6): 1823–67.
Asker, John, Chaim Fershtman, Jihye Jeon, and Ariel Pakes. 2020. “A Computational Framework for Analyzing Dynamic Auctions: The Market Impact of Information Sharing.” The RAND Journal of Economics 51 (3): 805–39.
Bajari, Patrick, C Lanier Benkard, and Jonathan Levin. 2007. “Estimating Dynamic Models of Imperfect Competition.” Econometrica 75 (5): 1331–70.
Barwick, Panle Jia, and Parag A Pathak. 2015. “The Costs of Free Entry: An Empirical Study of Real Estate Agents in Greater Boston.” The RAND Journal of Economics 46 (1): 103–45.
Berry, Steven T, and Giovanni Compiani. 2021. “Empirical Models of Industry Dynamics with Endogenous Market Structure.” Annual Review of Economics 13.
Besanko, David, Ulrich Doraszelski, Yaroslav Kryukov, and Mark Satterthwaite. 2010. “Learning-by-Doing, Organizational Forgetting, and Industry Dynamics.” Econometrica 78 (2): 453–508.
Borkovsky, Ron N, Ulrich Doraszelski, and Yaroslav Kryukov. 2010. “A User’s Guide to Solving Dynamic Stochastic Games Using the Homotopy Method.” Operations Research 58 (4-part-2): 1116–32.
Calvano, Emilio, Giacomo Calzolari, Vincenzo Denicolo, and Sergio Pastorello. 2020. “Artificial Intelligence, Algorithmic Pricing, and Collusion.” American Economic Review 110 (10): 3267–97.
Caoui, El Hadi. 2019. “Estimating the Costs of Standardization: Evidence from the Movie Industry.” R&R, Review of Economic Studies.
Collard-Wexler, Allan. 2013. “Demand Fluctuations in the Ready-Mix Concrete Industry.” Econometrica 81 (3): 1003–37.
Courthoud, Matteo. 2020. “Approximation Methods for Large Dynamic Stochastic Games.” Working Paper.
Doraszelski, Ulrich. 2003. “An r&d Race with Knowledge Accumulation.” Rand Journal of Economics, 20–42.
Doraszelski, Ulrich, and Kenneth L Judd. 2012. “Avoiding the Curse of Dimensionality in Dynamic Stochastic Games.” Quantitative Economics 3 (1): 53–93.
———. 2019. “Dynamic Stochastic Games with Random Moves.” Quantitative Marketing and Economics 17 (1): 59–79.
Doraszelski, Ulrich, Gregory Lewis, and Ariel Pakes. 2018. “Just Starting Out: Learning and Equilibrium in a New Market.” American Economic Review 108 (3): 565–615.
Doraszelski, Ulrich, and Mark Satterthwaite. 2010. “Computable Markov-Perfect Industry Dynamics.” The RAND Journal of Economics 41 (2): 215–43.
Egesdal, Michael, Zhenyu Lai, and Che-Lin Su. 2015. “Estimating Dynamic Discrete-Choice Games of Incomplete Information.” Quantitative Economics 6 (3): 567–97.
Eibelshäuser, Steffen, and David Poensgen. 2019. “Markov Quantal Response Equilibrium and a Homotopy Method for Computing and Selecting Markov Perfect Equilibria of Dynamic Stochastic Games.” Working Paper.
Ericson, Richard, and Ariel Pakes. 1995. “Markov-Perfect Industry Dynamics: A Framework for Empirical Work.” The Review of Economic Studies 62 (1): 53–82.
Esteban, Susanna, and Matthew Shum. 2007. “Durable-Goods Oligopoly with Secondary Markets: The Case of Automobiles.” The RAND Journal of Economics 38 (2): 332–54.
Farias, Vivek, Denis Saure, and Gabriel Y Weintraub. 2012. “An Approximate Dynamic Programming Approach to Solving Dynamic Oligopoly Models.” The RAND Journal of Economics 43 (2): 253–82.
Fershtman, Chaim, and Ariel Pakes. 2012. “Dynamic Games with Asymmetric Information: A Framework for Empirical Work.” The Quarterly Journal of Economics 127 (4): 1611–61.
Goettler, Ronald L, and Brett R Gordon. 2011. “Does AMD Spur Intel to Innovate More?” Journal of Political Economy 119 (6): 1141–1200.
Hotz, V Joseph, and Robert A Miller. 1993. “Conditional Choice Probabilities and the Estimation of Dynamic Models.” The Review of Economic Studies 60 (3): 497–529.
Huang, Ling, and Martin D Smith. 2014. “The Dynamic Efficiency Costs of Common-Pool Resource Exploitation.” American Economic Review 104 (12): 4071–4103.
Ifrach, Bar, and Gabriel Y Weintraub. 2017. “A Framework for Dynamic Oligopoly in Concentrated Industries.” The Review of Economic Studies 84 (3): 1106–50.
Igami, Mitsuru. 2017. “Estimating the Innovator’s Dilemma: Structural Analysis of Creative Destruction in the Hard Disk Drive Industry, 1981–1998.” Journal of Political Economy 125 (3): 798–847.
Iskhakov, Fedor, John Rust, and Bertel Schjerning. 2016. “Recursive Lexicographical Search: Finding All Markov Perfect Equilibria of Finite State Directional Dynamic Games.” The Review of Economic Studies 83 (2): 658–703.
Jeon, Jihye. 2020. “Learning and Investment Under Demand Uncertainty in Container Shipping.” The RAND Journal of Economics.
Kasahara, Hiroyuki, and Katsumi Shimotsu. 2009. “Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices.” Econometrica 77 (1): 135–75.
Maskin, Eric, and Jean Tirole. 1988. “A Theory of Dynamic Oligopoly, II: Price Competition, Kinked Demand Curves, and Edgeworth Cycles.” Econometrica: Journal of the Econometric Society, 571–99.
Pakes, Ariel, and Paul McGuire. 1994. “Computing Markov-Perfect Nash Equilibria: Numerical Implications of a Dynamic Differentiated Product Model.” RAND Journal of Economics 25 (4): 555–89.
———. 2001. “Stochastic Algorithms, Symmetric Markov Perfect Equilibrium, and the ‘Curse’of Dimensionality.” Econometrica 69 (5): 1261–81.
Pakes, Ariel, Michael Ostrovsky, and Steven Berry. 2007. “Simple Estimators for the Parameters of Discrete Dynamic Games (with Entry/Exit Examples).” The RAND Journal of Economics 38 (2): 373–99.
Pesendorfer, Martin, and Philipp Schmidt-Dengler. 2008. “Asymptotic Least Squares Estimators for Dynamic Games.” The Review of Economic Studies 75 (3): 901–28.
———. 2010. “Sequential Estimation of Dynamic Discrete Games: A Comment.” Econometrica 78 (2): 833–42.
Rust, John. 1994. “Structural Estimation of Markov Decision Processes.” Handbook of Econometrics 4: 3081–3143.
Ryan, Stephen P. 2012. “The Costs of Environmental Regulation in a Concentrated Industry.” Econometrica 80 (3): 1019–61.
Su, Che-Lin, and Kenneth L Judd. 2012. “Constrained Optimization Approaches to Estimation of Structural Models.” Econometrica 80 (5): 2213–30.
Sweeting, Andrew. 2013. “Dynamic Product Positioning in Differentiated Product Markets: The Effect of Fees for Musical Performance Rights on the Commercial Radio Industry.” Econometrica 81 (5): 1763–803.
Vreugdenhil, Nicholas. 2020. “Booms, Busts, and Mismatch in Capital Markets: Evidence from the Offshore Oil and Gas Industry.” R&R at Journal of Political Economy.
Weintraub, Gabriel Y, C Lanier Benkard, and Benjamin Van Roy. 2008. “Markov Perfect Industry Dynamics with Many Firms.” Econometrica 76 (6): 1375–1411.
Xu, Daniel Yi, and Yanyou Chen. 2020. “A Structural Empirical Model of r&d, Firm Heterogeneity, and Industry Evolution.” Journal of Industrial Economics.