Reinforcement Learning in a Prisoner's Dilemma - Games and Economic Behavior, 2024
I characterize the outcomes of a class of model-free reinforcement learning algorithms, such as stateless Q-learning, in a prisoner's dilemma. The behavior is studied in the limit as players stop experimenting after sufficiently exploring their options. A closed form relationship between the learning rate and game payoffs reveals whether the players will learn to cooperate or defect. The findings have implications for algorithmic collusion and also apply to asymmetric learners with different experimentation rules.
Learning in Markets
We show that strategic market games, the non-cooperative implementation of a matching with transfers or an assignment game, are weakly acyclic.
This property ensures that many common learning algorithms will converge to Nash equilibria in these games, and that the allocation mechanism can therefore be decentralized. Convergence hinges on the appropriate price clearing rule and has different properties for better- and best-response dynamics. We tightly characterize the robustness of this convergence in terms of so-called schedulers for both types of
dynamics.
Assignment Markets: Theory and Experiments - under review
We experimentally test convergence to the core in two-sided markets for heterogeneous indivisible goods under different trading institutions. We use bargaining and strategic games as predictors that naturally generalize the core, accommodating non-equilibrium behavior. The performance of the competing theories reflects the differences in trading procedures - market outcomes are close to Nash equilibrium predictions under auction-like institutions and close to bargaining for institutions that feature decentralized negotiations. This difference may be driving the documented effect of fewer no-trade outcomes at the expense of a higher chance of suboptimal match under free-form bargaining.
Reconstructing Strategies in Dynamic Games
The essential problem in the empirical analysis of the repeated games is to know what strategies are actually used by the players.
We propose a simple algorithm to reconstruct strategies out of the observed sequence of play.
The algorithm also accounts for the possibility of measurement and decision making errors and stays agnostic about equilibrium restrictions.
We apply the algorithm to both experimental and observational data.
Using the experimental data we conclusively show that players use strategies of memory no more than one period.
Using the observational data we confirm that Australian gas stations learn to collude using day of the week as coordination device.
Revealed Social Preferences
We use a revealed preference approach to develop tests for the observed behavior to be consistent with theories of social preferences. In particular, we provide nonparametric criteria for the observed set of choices to be generated by inequality averse preferences and increasing benevolence preferences. These tests can be applied to games commonly used to study social preferences: dictator, ultimatum, investment(trust) and carrot-stick games. We further apply these tests to experimental data on dictator and ultimatum games. Finally, we show how to identify the levels of altruism and fair outcomes using the developed revealed preference conditions.
Tractable Auctions and Existence of Competitive Equilibria
I show that an efficient allocation for an auction can be found in polynomial time if and only if it can be supported by a pricing equilibrium. In other words, a competitive market solves exactly the class of all tractable problems. The result follows directly after connecting the competitive equilibria with valued constrained satisfaction problems. This approach also offers practical methods for showing existence of Walrasian prices in classes of problems instead of instances. In other words, the competitive equilibrium is guaranteed before the bids are revealed to the auctioneer.
Mechanism Design with Memory and no Money
The paper provides an automated approach to mechanism design problems without money for arbitrary discount factors using dynamic programming and promised utility.
We illustrate the approach with problems from the literature - chore allocation or sharing an indivisible good or goods.
Additionally, we discuss the relationships between different classes of mechanisms, and show that promised utility mechanisms are more general than mappings from histories of finite memory.
Bayesian Nash Revealed
We study games of incomplete information from a revealed preference perspective and provide a nonparametric test for Bayesian Nash rationalization - existence of such expected utility representations for agents that observed choices are Bayesian Nash equilibria.
In the basic setup we assume that everything, but the cardinal utility is known by the researcher (including beliefs of players over distribution of types).
However, we discuss the possibility of relaxing several assumptions.
In particular, we consider that researcher may be unaware of the distribution of types, or number of types.
The test can also be applied with assumptions about rationality of agents that follow different theories of behavior under risk - cumulative propsect theory or rank-dependent expected utility.
Combinatorial Auction with Public Goods
We present an efficient linear program that can be used to solve winner determination problem for combinatorial auction in a market with both private and public goods. The setting can be applied to voters bidding in an interpretive environment with a propositional logic bidding language. We allow for a wide set of constraints, complement and substitute relations, and discuss several Cumulative Utility Functions. Solving the winner determination problem as a linear program proves to have a large computational advantage over other methods, which is valuable due to problem being generally NP-complete. We confirm this result with in silico experiments.