###

##
Reinforcement Learning in a Prisoner's Dilemma

I fully characterize the outcomes of a wide class of model-free reinforcement learning algorithms, such as Q-learning, in a prisoner’s dilemma. The behavior is studied in the limit as players explore their options sufficiently and eventually stop experimenting.
Whether the players learn to cooperate or defect can be determined in a closed form from the relationship between the learning rate and the payoffs of the game.
The results generalize to asymmetric learners and many experimentation rules with implications for the issue of algorithmic collusion.

###
Working papers

##
Learning in Markets

We show that strategic market games, the non-cooperative implementation of a matching with transfers or an assignment game, are weakly acyclic.
This property ensures that many common learning algorithms will converge to Nash equilibria in these games, and that the allocation mechanism can therefore be decentralized. Convergence hinges on the appropriate price clearing rule and has different properties for better- and best-response dynamics. We tightly characterize the robustness of this convergence in terms of so-called schedulers for both types of
dynamics.

##
Assignment Markets: Theory and Experiments - under review

This is an experimental study of an "assignment game", which captures behavior in two-sided markets for indivisible heterogenous items with unit demand and unit supply.
We show that in every Nash equilibrium of the corresponding market game, every positive probability outcome with every good traded forms a competitive equilibrium, and thus also achieves a maximum total surplus and has an optimal assignment.
However, when some goods are not traded, every outcome of a market game forms a competitive equilibrium only with respect to an economy consisting of the goods that were traded - inefficiency may arise from mis-coordination with respect to the goods that were not traded.
Experimental results show players behaving close to Nash/competitive equilibrium and bargaining model predictions, but with notable differences between different market designs.

##
Reconstructing Strategies in Dynamic Games

The essential problem in the empirical analysis of the repeated games is to know what strategies are actually used by the players.
We propose a simple algorithm to reconstruct strategies out of the observed sequence of play.
The algorithm also accounts for the possibility of measurement and decision making errors and stays agnostic about equilibrium restrictions.
We apply the algorithm to both experimental and observational data.
Using the experimental data we conclusively show that players use strategies of memory no more than one period.
Using the observational data we confirm that Australian gas stations learn to collude using day of the week as coordination device.

##
Revealed Social Preferences

We use a revealed preference approach to develop tests for the observed behavior to be consistent with theories of social preferences. In particular, we provide nonparametric criteria for the observed set of choices to be generated by inequality averse preferences and increasing benevolence preferences. These tests can be applied to games commonly used to study social preferences: dictator, ultimatum, investment(trust) and carrot-stick games. We further apply these tests to experimental data on dictator and ultimatum games. Finally, we show how to identify the levels of altruism and fair outcomes using the developed revealed preference conditions.

###
Work in progress

##
Mechanism Design with Memory and no Money

###

The paper provides an automated approach to mechanism design problems without money for arbitrary discount factors using dynamic programming and promised utility.
We illustrate the approach with problems from the literature - chore allocation or sharing an indivisible good or goods.
Additionally, we discuss the relationships between different classes of mechanisms, and show that promised utility mechanisms are more general than mappings from histories of finite memory.

##
Bayesian Nash Revealed

We study games of incomplete information from a revealed preference perspective and provide a nonparametric test for Bayesian Nash rationalization - existence of such expected utility representations for agents that observed choices are Bayesian Nash equilibria.
In the basic setup we assume that everything, but the cardinal utility is known by the researcher (including beliefs of players over distribution of types).
However, we discuss the possibility of relaxing several assumptions.
In particular, we consider that researcher may be unaware of the distribution of types, or number of types.
The test can also be applied with assumptions about rationality of agents that follow different theories of behavior under risk - cumulative propsect theory or rank-dependent expected utility.

##
Combinatorial Auction with Public Goods

We present an efficient linear program that can be used to solve winner determination problem for combinatorial auction in a market with both private and public goods. The setting can be applied to voters bidding in an interpretive environment with a propositional logic bidding language. We allow for a wide set of constraints, complement and substitute relations, and discuss several Cumulative Utility Functions. Solving the winner determination problem as a linear program proves to have a large computational advantage over other methods, which is valuable due to problem being generally NP-complete. We confirm this result with in silico experiments.

##
Equitable Sequencing and Allocation Under Uncertainty

###

The paper uses Thue-Morse sequence and knapsack problem to formally relate several problems of negotiations, fair division and tournament sequencing. All of the considered problems are shown to be reduced versions of partitioning problem, i.e. equally dividing items into two sets. An argument is made that it is often beneficial to set the partitioning problem as stochastic with item values drawn from a distribution, especially in the case of intangible items like skill level. Balanced alternation according to Thue-Morse sequence proves to be the better mechanic in this case for all sample sizes for common distributions. This is not true for all distributions and for settings with k>2 agents. A further improvement upon this mechanic is greedy approximation over the underlying subset-sum problem.