Superrational Agents Kelly Bet Influence!
As a follow-up to the Walled Garden discussion about Kelly betting, Scott Garrabrant made some super-informal conjectures to me privately, involving the idea that some class of “nice” agents would “Kelly bet influence”, where “influence” had something to do with anthropics and acausal trade.
I was pretty incredulous at the time. However, as soon as he left the discussion, I came up with an argument for a similar fact. (The following does not perfectly reflect what Scott had in mind, by any means. His notion of “influence” was very different, for a start.)
The meat of my argument is just Critch’s negotiable RL theorem. In fact, that’s practically the entirety of my argument. I’m just thinking about the consequences in a different way from how I have before.
Rather than articulating a real decision theory that deals with all the questions of acausal trade, bargaining, commitment races, etc, I’m just going to imagine a class of superrational agents which solve these problems somehow. These agents “handshake” with each other and negotiate (perhaps acausally) a policy which is Pareto-optimal wrt each of their preferences.
Critch’s negotiable RL result studies the question of what an AI should do if it must serve multiple masters. For this post, I’ll refer to the masters as “coalition members”.
He shows the following:
Any policy which is Pareto-optimal with respect to the preferences of coalition members, can be understood as doing the following. Each coalition member is assigned a starting weight, with weights summing to one. At each decision, the action is selected via the weighted average of the preferences of each coalition member, according to the current weights. At each observation, the weights are updated via Bayes’ Law, based on the beliefs of coalition members.
He was studying what an AI’s policy should be, when serving the coalition members; however, we can apply this result to a coalition of superrational agents who are settling on their own policy, rather than constructing a robotic servant.
Critch remarks that we can imagine the weight update as the result of bets which the coalition members would make with each other. I’ve known about this for a long time, and it made intuitive sense to me that they’ll happily bet on their beliefs; so, of course they’ll gain/lose influence in the coalition based on good/bad predictions.
What I didn’t think too hard about was how they end up betting. Sure, the fact that it’s equivalent to a Bayesian update is remarkable. But it makes sense once you think about the proof.
Or does it?
To foreshadow: the proof works from the assumption of Pareto optimality. So it collectively makes sense for the agents to bet this way. But the “of course it makes sense for them to bet on their beliefs” line of thinking tricks you into thinking that it individually makes sense for the agents to bet like this. However, this need not be the case.
Kelly Betting & Bayes
The Kelly betting fraction can be written as:
Where p is your probability for winning, and r is the return rate if you win (ie, if you stand to double your money, r=2; etc).
Now, it turns out, betting f of your money (and keeping the rest in reserve) is equivalent to betting p of your money and putting (1-p) on the other side of the bet. Betting against yourself is a pretty silly thing to do, but since you’ll win either way, there’s no problem:
Betting of your money:
If you win, you’ve got , plus the you held.
So the sum = of your initial money.
If you lose, you’ve still got of what you had.
So this is just
Betting against yourself, with fractions like your beliefs:
If you win, you’ve got of your money.
If you lose, the payoff ratio (assuming you can get the reverse odds for the reverse bet) is . So, since you put down , you get .
But now imagine that a bunch of bettors are using the second strategy to make bets with each other, with the “house odds” being the weighted average of all their beliefs (weighted by their bankrolls, that is). Aside from the betting-against-yourself part, this is a pretty natural thing to do: these are the “house odds” which make the house revenue-neutral, so the house never has to dig into its own pockets to award winnings.
You can imagine that everyone is putting money on two different sides of a table, to indicate their bets. When the bet is resolved, the losing side is pushed over to the winning side, and everyone who put money on the winning side picks up a fraction of money proportional to the fraction they originally contributed to that side. (And since payoffs of the bet-against-yourself strategy are exactly identical to Kelly betting payoffs, a bunch of Kelly bets at house odds rearrange money in exactly the same way as this.)
But this is clearly equivalent to how hypotheses redistribute weight during Bayesian updates!
So, a market of Kelly betters re-distributes money according to Bayesian updates.
Therefore, we can interpret the superrational coalition members as betting their coalition weight, according to the Kelly criterion.
But, this is a pretty weird thing to do!
I’ve argued that the main sensible justification for using the Kelly criterion is if you have utility logarithmic in wealth. Here, this translates to utility logarithmic in coalition weight.
It’s possible that under some reasonable assumptions about the world, we can argue that utility of coalition members will end up approximately logarithmic. But Critch’s theorem applies to lots of situations, including small ones where there isn’t any possibility for weird things to happen over long chains of bets as in some arguments for Kelly.
Typically, final utility will not even be continuous in coalition weight: small changes in coalition weight often won’t change the optimal strategy at all, but at select tipping points, the optimal strategy will totally change to reflect the reconfigured trade-offs between preferences.
Intuitively, these tipping points should factor significantly in a coalition member’s betting strategy; you’d be totally indifferent to small bets which can’t change anything, but avoid specific transitions strongly, and seek out others. If the coalition members were betting based on their selfish preferences, this would be the case.
Yet, the coalition members end up betting according to a very simple formula, which does not account for any of this.
We can’t justify this betting behavior from a selfish perspective (that is, not with the usual decision theories); as I said, the bets don’t make sense.
But we’re not dealing with selfish agents. These agents are acting according to a Pareto-optimal policy.
And that’s ultimately the perspective we can justify the bets from: these are altruistically motivated bets. Exchanging coalition weight in this way is best for everyone. It keeps you Pareto-optimal!
This is very counterintuitive. I suspect most people would agree with me that there seems to be no reason to bet, if you’re being altruistic rather than selfish. Not so! They’re not betting for their personal benefit. They’re betting for the common good!
Of course, that fact is a very straightforward consequence of Critch’s theorem. It shouldn’t be surprising. Yet, somehow, it didn’t stick out to me in quite this way. I was too stuck in the frame of trying to interpret the bets selfishly, as Pareto-improvements which both sides happily agree to.
I’m quite curious whether we can say anything interesting about how altruistic agents would handle money, based on this. I don’t think it means altruists should Kelly bet money; money is a very different thing from coalition weight. Coalition weights are like exchange rates or prices. Money is more of a thing being exchanged. You do not pay coalition weight in order to get things done.