Incorporating Mechanism Design Into Decision Theory

In the previous post, we looked at one way of handling externalities: letting other agents pay you to shift your decision. And we also considered the technique of aggregating those offers into an auction. This technique of “implementing a mechanism to handle an incentive misalignment” is extremely useful and seems like a promising avenue for future improvements to decision theory.

I want to frame mechanisms as “things that reshape incentives.” Auctions, markets, and voting systems are all mechanisms; social technologies that can be invented by working backwards from a social welfare measure (a social choice theory) and designing a game such that players following their individual incentives will find themselves in socially high-ranking Nash equilibria.

I suspect that incorporating mechanism design more fully into decision theory will be extremely fruitful. Yudkowsky’s probabilistic rejection algorithm[1] first identifies the socially optimal outcome (a fair Pareto optimum), and works backwards to identify a decision procedure which:

  • If universalized leads to the socially optimal outcome

  • If best-responded to still leads to the socially optimal outcome (stabilizing it as a Nash equilibrium)

The Robust Cooperation paper does the same thing for the Prisoners’ Dilemma. Probabilistic rejection also only uses appropriate-threats, like sometimes rejecting unfair offers even at cost to oneself, leading it to degrade gracefully when negotiators disagree about what the socially optimal outcome is. The term “non-credible threat” is named that way because a classically-rational agent would never actually pay a cost “merely” for its incentive-reshaping effect on the other players. Not all non-credible threats are appropriate, but there are times when it’s appropriate to pay costs to reshape the incentives of others.

Policy Counterfactuals

There is a representation theorem by Joyce, which I understand to be something like “a Joyce-rational agent will choose an action that maximizes an expected utility expression that looks like .” The Functional Decision Theory (FDT) paper frames the differences between major decision theories like CDT, EDT, and FDT as using different ways to compute the free parameter . This parameter is an action-counterfactual, whose interpretation is something like “if I were to take this action , what is the probability that outcome would occur, given some background knowledge ?”[2]

So we can look at “reshaping an agent’s incentives” through the lens of “reshaping an agent’s counterfactual expectations.” We can think of other agents performing this scan over policies they could implement, looking for what will elicit the most desirable response from our decision theory. And the structure of our decision theory determines their counterfactual expectations about what those responses are.

Hardening Your Decision Theory

From this perspective, the assumption that other agents have legible open-source access to our decision theory forces us to harden the attack surface of our decision theory, rather than relying on security through obscurity.

Giving in to inappropriate-threats like blackmail is a software vulnerability which another decision theory might exploit. Similarly for failing to make appropriate-threats, like accepting any positive offer in the Ultimatum game or unconditionally Cooperating in every round of an infinitely-iterated Prisoners’ Dilemma. (We want our decision theory to implement a strategy more like tit-for-tat, which Cooperates conditional on reciprocal Cooperation, or some other compensation.)

We want to design our decision theory so that, to the greatest extent possible, other agents find that best-responding to our decision theory leads to high-ranking outcomes according to our social choice theory.

Mechanism Counterfactuals

Software systems can simulate any computable mechanism given enough computing power. This is a sufficient condition for such a mechanism to influence the behavior of software agents; such agents can perform a logical handshake to act as if a mechanism “exists”, even if it is not “really” implemented.

How do you act “as if” a voting system exists? By imagining how everyone would vote, if such a voting system existed, and then acting in accordance with the results. The same works for auctions, markets, negotiations, anything that reshapes incentives compared with the underlying strategic context. And these can be composed together into networks, like first imagining “as if” private property ownership-tags exist, and then imagining “as if” there were a market for the goods on the underlying consensus-imaginary property rights layer.

State Channels

One specific architecture for this sort of thing is state channels: a relatively simple smart contract serves as a dispute resolution mechanism for this whole scheme. Alice and Bob each deposit some funds with this smart contract, and then conduct most of their activity together without involving the blockchain at all. Alice and Bob exchange signed messages with each other, which enables them to prove the authenticity of these messages to the dispute resolution smart contract.

One of the most valuable features of a state channel is that it shapes the counterfactual expectations of each participant, so that each can safely treat the interaction “as if” it was happening with all of the security guarantees of the underlying blockchain, without all of the overhead. This includes being able to act “as if” further smart contracts had been deployed to the block chain, and these virtual smart contracts can be composed together into networks within the state channel.

At the end of their interaction, Alice and Bob can inform the smart contract of the result and withdraw whatever funds they’re entitled to. Neither has an incentive to distort the report in their favor, because each has enough cryptographic information to prove the actual result in the event of a dispute.

Logical Commitments

Software systems with legible access to each other’s source code don’t even need the overhead of a blockchain to hold a consensus model of other software systems in their heads. The legibility makes this logical line-of-sight transitive; any software system that AliceBot reasons about, BobBot can also reason about.

A general-purpose technology we’ll want for open-source game theory is a logical commitment. (Or just commitment when it’s clear from context.) When AliceBot can implement any policy , a logical commitment is the legible fact that AliceBot will only implement a policy from this subset. When , this corresponds to the null commitment “I will implement a policy .” When , this corresponds to the very specific commitment “I will implement exactly the policy .”

We’ll also want conditional commitments, which apply if some condition is true. FairBot offers the conditional commitment “if I can prove that you’ll Cooperate with me, I’ll Cooperate with you.” It also offers the complementary commitment to cover the case where such a proof search fails: “In that case, I’ll Defect.” For any domain where a decision theory has a defined output, it is implicitly making commitments and conditional commitments.

Finally, a joint commitment is subset of a joint policy space , and represents a commitment for each corresponding player. These can also be conditional.

In the next post we’ll see an example of how networks of counterfactual mechanisms can be used to produce useful logical commitments.

  1. ^

    The algorithm is described in Project Lawful so spoilers but it’s here, and it’s discussed without spoilers earlier in this sequence.

  2. ^

    I assume the theorem still makes sense if you think of agents as optimizing their global policy rather than their local action after Bayesian updating, but I haven’t been able to look at the original paper. It looks to be available behind a paywall here. But policy optimization currently seems like the obvious way to go so I’ll go back to talking about policy counterfactuals.

    EDIT: gwern has put up a copy of the relevant chapter, see the discussion here for more details.