I characterize the outcomes of a class of model-free reinforcement learning algorithms, such as stateless Q-learning, in a prisoner's dilemma. The behavior is studied in the limit as players stop experimenting after sufficiently exploring their options. A closed form relationship between the learning rate and game payoffs reveals whether the players will learn to cooperate or defect. The findings have implications for algorithmic collusion and also apply to asymmetric learners with different experimentation rules.
We show that strategic market games, the non-cooperative implementation of a matching with transfers or an assignment game, are weakly acyclic.
This property ensures that many common learning algorithms will converge to Nash equilibria in these games, and that the allocation mechanism can therefore be decentralized. Convergence hinges on the appropriate price clearing rule and has different properties for better- and best-response dynamics. We tightly characterize the robustness of this convergence in terms of so-called schedulers for both types of
We experimentally test convergence to the core in two-sided markets for heterogeneous indivisible goods under different trading institutions. We use bargaining and strategic games as predictors that naturally generalize the core, accommodating non-equilibrium behavior. The performance of the competing theories reflects the differences in trading procedures - market outcomes are close to Nash equilibrium predictions under auction-like institutions and close to bargaining for institutions that feature decentralized negotiations. This difference may be driving the documented effect of fewer no-trade outcomes at the expense of a higher chance of suboptimal match under free-form bargaining.
The essential problem in the empirical analysis of the repeated games is to know what strategies are actually used by the players.
We propose a simple algorithm to reconstruct strategies out of the observed sequence of play.
The algorithm also accounts for the possibility of measurement and decision making errors and stays agnostic about equilibrium restrictions.
We apply the algorithm to both experimental and observational data.
Using the experimental data we conclusively show that players use strategies of memory no more than one period.
Using the observational data we confirm that Australian gas stations learn to collude using day of the week as coordination device.
We use a revealed preference approach to develop tests for the observed behavior to be consistent with theories of social preferences. In particular, we provide nonparametric criteria for the observed set of choices to be generated by inequality averse preferences and increasing benevolence preferences. These tests can be applied to games commonly used to study social preferences: dictator, ultimatum, investment(trust) and carrot-stick games. We further apply these tests to experimental data on dictator and ultimatum games. Finally, we show how to identify the levels of altruism and fair outcomes using the developed revealed preference conditions.
The paper provides an automated approach to mechanism design problems without money for arbitrary discount factors using dynamic programming and promised utility.
We illustrate the approach with problems from the literature - chore allocation or sharing an indivisible good or goods.
Additionally, we discuss the relationships between different classes of mechanisms, and show that promised utility mechanisms are more general than mappings from histories of finite memory.
We study games of incomplete information from a revealed preference perspective and provide a nonparametric test for Bayesian Nash rationalization - existence of such expected utility representations for agents that observed choices are Bayesian Nash equilibria.
In the basic setup we assume that everything, but the cardinal utility is known by the researcher (including beliefs of players over distribution of types).
However, we discuss the possibility of relaxing several assumptions.
In particular, we consider that researcher may be unaware of the distribution of types, or number of types.
The test can also be applied with assumptions about rationality of agents that follow different theories of behavior under risk - cumulative propsect theory or rank-dependent expected utility.
We present an efficient linear program that can be used to solve winner determination problem for combinatorial auction in a market with both private and public goods. The setting can be applied to voters bidding in an interpretive environment with a propositional logic bidding language. We allow for a wide set of constraints, complement and substitute relations, and discuss several Cumulative Utility Functions. Solving the winner determination problem as a linear program proves to have a large computational advantage over other methods, which is valuable due to problem being generally NP-complete. We confirm this result with in silico experiments.
The paper uses Thue-Morse sequence and knapsack problem to formally relate several problems of negotiations, fair division and tournament sequencing. All of the considered problems are shown to be reduced versions of partitioning problem, i.e. equally dividing items into two sets. An argument is made that it is often beneficial to set the partitioning problem as stochastic with item values drawn from a distribution, especially in the case of intangible items like skill level. Balanced alternation according to Thue-Morse sequence proves to be the better mechanic in this case for all sample sizes for common distributions. This is not true for all distributions and for settings with k>2 agents. A further improvement upon this mechanic is greedy approximation over the underlying subset-sum problem.