Setting: agents making strategic decisions (new) in dynamic environments.
Lit review: forthcoming IO Handbook chapter Aguirregabiria, Collard-Wexler, and Ryan (2021)
Typically in IO we study agents in strategic environments. Complicated in dynamic environments.
We will cover first the estimation and then the computation of dynamic games
Last: bridge between Structural IO and Artificial Intelligence
Stylized version of Ericson and Pakes (1995) (no entry/exit)
\(J\) firms (products) indexed by \(j \in \lbrace 1, ..., J \rbrace\)
Time \(t\) is dicrete, horizon is infinite
States \(s_{jt} \in \lbrace 1, ... \bar s \rbrace\): quality of product \(j\) in period \(t\)
Actions \(a_{jt} \in \mathbb R^+\): investment decision of firm \(j\) in period \(t\)
Static payoffs \[ \pi_j (s_{jt}, \boldsymbol s_{-jt}, a_{jt}; \theta^\pi) \] where
Note: if we micro-fund \(\pi(\cdot)\) , e.g. with some demand and supply model, we have 2 strategic decisions: prices (static) and investment (dynamic).
State transitions \[ \boldsymbol s_{t+1} = f(\boldsymbol s_t, \boldsymbol a_t, \boldsymbol \epsilon_t; \theta^f) \] where
Objective function: firms maximize expected discounted future profits \[ \max_{\boldsymbol a} \ \mathbb E_t \left[ \sum_{\tau=0}^\infty \beta^{\tau} \pi_{j, t+\tau} (\theta^\pi) \right] \]
The value function of firm \(j\) at time \(t\) in state \(\boldsymbol s_{t}\), under a set of strategy functions \(\boldsymbol P\) (one for each firm) is \[ V^{\boldsymbol P_{-j}}_{j} (\mathbf{s}_{t}) = \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Bigg\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + \beta \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}\right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace \] where
\(\pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi )\) are the static profits of firm \(j\) given action \(a_{jt}\) and policy functions \(\boldsymbol P_{-j}\) for all firms a part from \(j\)
The expecation \(\mathbb E\) is taken with respect to the conditional transition probabilities \(f^{\boldsymbol P_{-j}} (\mathbf{s}_{t+1} | \mathbf{s}_{t}, a_{jt} ; \theta^f)\)
Equillibrium notion: Markow Perfect Equilibrium (Maskin and Tirole 1988)
What is it basically?
We want to estimate 2 sets of parameters:
Generally 2 approaches
Bajari, Benkard, and Levin (2007) plan
Important: parametric assumptions would contradict the model for the estimation of value/policy functions
First step: from transitions \(f(\hat \theta^f)\) and CCPs \(\boldsymbol{\hat P}\) to values
We can use transitions and CCPs to simulate histories (of length \(\tilde T\))
Given a parameter value \(\tilde \theta^\pi\), we can compute static payoffs: \(\pi_{j}^{\boldsymbol {\hat{P}_{-j}}} \left( \tilde a_{j\tau}, \boldsymbol{\tilde s}_{\tau} ; \tilde \theta^\pi \right)\)
Simulated history + static payoffs = simulated value function \[ {V}_{j}^{\boldsymbol {\hat{P}}} \left(\boldsymbol{s}_{t} ; \tilde \theta^\pi \right) = \sum_{\tau=0}^{\tilde T} \beta^{\tau} \pi_{j}^{\boldsymbol {\hat{P}_{-j}}} \left( \tilde a_{j\tau}, \boldsymbol{\tilde s}_{\tau} ; \tilde \theta^\pi \right) \]
We can average over many, e.g. \(R\), simulated value functions to get an expected value function \[ {V}_{j}^{\boldsymbol {\hat{P}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) = \frac{1}{R} \sum_{r=0}^{R}\Bigg( \sum_{\tau=0}^{\tilde T} \beta^{\tau} \pi_{j}^{\boldsymbol {\hat{P}_{-j}}} \left(\tilde a^{(r)}_{j\tau}, \boldsymbol{\tilde s}^{(r)}_{\tau} ; \tilde \theta^\pi \right) \Bigg) \]
For \(r = 1, ..., R\) simulations do:
Then average all the value functions together to obtain an expected value function \(V_{j}^{\boldsymbol {\hat{P}}, R} \left(\boldsymbol{s}_{t} ; \tilde \theta^\pi \right)\)
Note: advantage of simulations: can be parallelized
What have we done so far?
How do we pick the \(\theta^\pi\) that best rationalizes the data?
BBL idea
Note: it’s an inequality statement
If the observed policy \({\color{green}{\boldsymbol{\hat P}}}\) is optimal,
All other policies \({\color{red}{\boldsymbol{\tilde P}}}\)
… at the true parameters \(\theta^f\)
… should give a lower expected value \[ V_{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \leq V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \]
So which are the true parameters?
Those for which any deviation from the observed policy \({\color{green}{\boldsymbol{\hat P}}}\) yields a lower value
Objective function to minimize: violations under alternative policies \({\color{red}{\boldsymbol{\tilde P}}}\) \[ \min_{\tilde \theta^\pi} \sum_{\boldsymbol s_{t}} \sum_{{\color{red}{\boldsymbol{\tilde P}}}} \Bigg[\min \bigg\lbrace V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) - V_{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \ , \ 0 \bigg\rbrace \Bigg]^{2} \]
Estimator: \(\theta^\pi\) that minimizes the average (squared) magnitude of violations for any alternative policy \({\color{red}{\boldsymbol{\tilde P}}}\) \[ \hat{\theta}^\pi= \arg \min_{\tilde \theta^\pi} \sum_{\boldsymbol s_{t}} \sum_{{\color{red}{\boldsymbol{\tilde P}}}} \Bigg[\min \bigg\lbrace V_{j}^{{\color{green}{\boldsymbol{\hat P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) - V_{j}^{{\color{red}{\boldsymbol{\tilde P}}}, R} \left( \boldsymbol{s}_{t} ; \tilde \theta^\pi \right) \ , \ 0 \bigg\rbrace \Bigg]^{2} \]
We have seen that there are competing methods.
What are the advantages of Bajari, Benkard, and Levin (2007) over those?
Ericson and Pakes (1995) and companion paper Pakes and McGuire (1994) for the computation
\(J\) firms indexed by \(j \in \lbrace 1, ..., J \rbrace\)
Time \(t\) is dicrete \(t\), horizon is infinite
State \(s_{jt}\): quality of firm \(j\) in period \(t\)
Per period profits \[ \pi (s_{jt}, \boldsymbol s_{-jt}, ; \theta^\pi) \] where
We can micro-fund profits with some demand and supply functions
Investment: firms can invest an dollar amount \(x\) to increase their future quality
Continuous decision variable (\(\neq\) Rust)
Probability that investment is successful \[ \Pr \big(i_{jt} \ \big| \ a_{it} = x \big) = \frac{\alpha x}{1 + \alpha x} \]
Higher investment, higher success probability
\(\alpha\) parametrizes the returns on investment
Quality depreciation
Law of motion \[ s_{j,t+1} = s_{jt} + i_{jt} - \delta \]
Note that in Ericson and Pakes (1995) we have two separate decision variables
Does not have to be the case!
Example: Besanko et al. (2010)
Firms maximize the expected flow of discounted profits \[ \max_{\boldsymbol a} \ \mathbb E_t \left[ \sum_{\tau=0}^\infty \beta^{\tau} \pi_{j, t+\tau} (\theta^\pi) \right] \] Markow Perfect Equilibrium
Equillibrium notion: Markow Perfect Equilibrium (Maskin and Tirole 1988)
One important extension is exit.
The Belman Equation of incumbent \(j\) at time \(t\) is \[ V^{\boldsymbol P_{-j}}_{j} (\mathbf{s}_{t}) = \max_{d^{exit}_{jt} \in \lbrace 0, 1 \rbrace} \Bigg\lbrace \begin{array}{c} \beta \phi^{exit} \ , \newline \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Big\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + \beta \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}\right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Big\rbrace \end{array} \Bigg\rbrace \] where
We can also incorporate endogenous entry.
Value function \[ V_{j}^{\boldsymbol P_{-j}} (e, \boldsymbol x_{-jt} ; \theta) = \max_{d^{entry} \in \lbrace 0,1 \rbrace } \Bigg\lbrace \begin{array}{c} 0 \ ; \newline - \phi^{entry} + \beta \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} (\bar s, \boldsymbol s_{-j, t+1} ; \theta) \ \Big| \ \boldsymbol s_{t} ; \theta^f \Big] \end{array} \Bigg\rbrace \] where
Do we observe potential entrants?
Doraszelski and Satterthwaite (2010): a MPE might not exist in Ericson and Pakes (1995) model.
Markov Perfect Bayesian Nash Equilibrium (MPBNE)
Solving the model is very similar to Rust
Where do things get complicated / tricky? Policy function update
Imagine a stylized exit game with 2 firms
Issues: value function iteration might not converge and equilibrium multeplicity.
How to find them?
Can we assume them away?
What are the computational bottlenecks? \[ V^{\boldsymbol P_{-j}}_{j} ({\color{red}{\mathbf{s}_{t}}}) = \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Bigg\lbrace \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + \beta \mathbb E_{{\color{red}{\mathbf{s}_{t+1}}}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}\right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace \]
Note: bottlenecks are not addittive but multiplicative: have to solve the expectation for each point in the state space. Improving on any of the two helps a lot.
Two and a half classes of solutions:
Note: useful also to get good starting values for a full solution method!
Weintraub, Benkard, and Van Roy (2008): what if firms had no idea about the state of other firms?
The value function becomes \[ V_{j} ({\color{red}{s_{t}}}) = \max_{a_{jt} \in \mathcal{A}_j \left({\color{red}{s_{t}}}\right)} \Bigg\lbrace {\color{red}{\mathbb E_{\boldsymbol s_t}}} \Big[ \pi_{j} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) \Big| P \Big] + \beta \mathbb E_{{\color{red}{s_{t+1}}}} \Big[ V_{j} \left({\color{red}{s_{t+1}}}\right) \ \Big| \ a_{jt}, {\color{red}{s_{t}}} ; \theta^f \Big] \Bigg\rbrace \]
Doraszelski and Judd (2019): what if instead of simultaneously, firms would move one at the time at random?
The value function becomes \[ V^{\boldsymbol P_{-j}}_{j} (\mathbf{s}_{t}, {\color{red}{n=j}}) = \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Bigg\lbrace {\color{red}{\frac{1}{J}}}\pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + {\color{red}{\sqrt[J]{\beta}}} \mathbb E_{{\color{red}{n, s_{j, t+1}}}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}, {\color{red}{n}} \right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg\rbrace \]
Computational gain
Doraszelski and Judd (2012): what’s the advantage of continuous time?
With continuous time, the value function becomes \[ V^{\boldsymbol P_{-j}}_{j} (\mathbf{s}_{t}) = \max_{a_{jt} \in \mathcal{A}_j \left(\mathbf{s}_{t}\right)} \Bigg\lbrace \frac{1}{\lambda(a_{jt}) - \log(\beta)} \Bigg( \pi_{j}^{\boldsymbol P_{-j}} (a_{jt}, \mathbf{s}_{t} ; \theta^\pi ) + \lambda(a_{jt}) \mathbb E_{\boldsymbol s_{t+1}} \Big[ V_{j}^{\boldsymbol P_{-j}} \left(\mathbf{s}_{t+1}\right) \ \Big| \ a_{jt}, \boldsymbol s_{t} ; \theta^f \Big] \Bigg) \Bigg\rbrace \]
Computational gain
Which method is best?
I compare them in Courthoud (2020)
Some applications of these methods include
There is one method to approximate the equilibrium in dynamic games that is a bit different from the others: Pakes and McGuire (2001)
Experience-Based Equilibrium
Players start with alternative-specific value function
Until convergence, do:
Compute optimal action, given \(\bar V_{j, a}^{(t)} (\boldsymbol s ; \theta)\) \[ a^* = \arg \max_a \bar V_{j, a}^{(t)} (\boldsymbol s ; \theta) \]
Observe the realized payoff \(\pi_{j, a^*}(\boldsymbol s ; \theta)\) and the realized next state \(\boldsymbol {s'}(\boldsymbol s, a^*; \theta)\)
Update the alternative-specific value function of the chosen action \(k^*\) \[ \bar V_{j, a^*}^{(t+1)} (\boldsymbol s ; \theta) = (1-\alpha_{\boldsymbol s, t}) \bar V_{j, a^*}^{(t)} (\boldsymbol s ; \theta) + \alpha_{\boldsymbol s, t} \Big[\pi_{j, a^*}(\boldsymbol s ; \theta) + \arg \max_a \bar V_{j, a}^{(t)} (\boldsymbol s' ; \theta) \Big] \] where
Where is the strategic interaction?
Importance of starting values
Convergence by desing
Computer Science reinforcement learning literature (AI): Q-learning
