Noosphere89 comments on Decision theory does not imply that we get to have nice things

Noosphere89 28 Jan 2025 14:17 UTC
3 points
−2
To be honest, I probably agree with Ryan Greenblatt on the local validity point more that AIs probably would spend at least a little amount of resources, assuming preferences that are indifferent to humans, ~~though I personally think the cost of niceness is surprisingly high, such that small shards of niceness can be ignored:~~
https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#wy9cSASwJCu7bjM6H
That said, I think the most valuable point here is actually it’s short version, where it points out that for LDT, it tries to maximize utility, and it will only cooperate if it gets more expected utility out of the interaction than if it didn’t, and thus LDT doesn’t solve value conflicts except in special cases, and this is going to be a general point I think is very largely used in the future, because I expect a lot of people to propose that some version of decision theory like LDT will solve some burning value conflict by them cooperating, and I’ll have to tell them that decision theory cannot do this:
A common confusion I see in the tiny fragment of the world that knows about logical decision theory (FDT/UDT/etc.), is that people think LDT agents are genial and friendly for each other.^[1]
One recent example is Will Eden’s tweet about how maybe a molecular paperclip/squiggle maximizer would leave humanity a few stars/galaxies/whatever on game-theoretic grounds. (And that’s just one example; I hear this suggestion bandied around pretty often.)
I’m pretty confident that this view is wrong (alas), and based on a misunderstanding of LDT. I shall now attempt to clear up that confusion.
To begin, a parable: the entity Omicron (Omega’s little sister) fills box A with $1M and box B with $1k, and puts them both in front of an LDT agent saying “You may choose to take either one or both, and know that I have already chosen whether to fill the first box”. The LDT agent takes both.
“What?” cries the CDT agent. “I thought LDT agents one-box!”
LDT agents don’t cooperate because they like cooperating. They don’t one-box because the name of the action starts with an ‘o’. They maximize utility, using counterfactuals that assert that the world they are already in (and the observations they have already seen) can (in the right circumstances) depend (in a relevant way) on what they are later going to do.
A paperclipper cooperates with other LDT agents on a one-shot prisoner’s dilemma because they get more paperclips that way. Not because it has a primitive property of cooperativeness-with-similar-beings. It needs to get the more paperclips.
If a bunch of monkeys want to build a paperclipper and have it give them nice things, the paperclipper needs to somehow expect to wind up with more paperclips than it otherwise would have gotten, as a result of trading with them.
If the monkeys instead create a paperclipper haplessly, then the paperclipper does not look upon them with the spirit of cooperation and toss them a few nice things anyway, on account of how we’re all good LDT-using friends here.
It turns them into paperclips.
Because you get more paperclips that way.
That’s the short version. Now, I’ll give the longer version.^[2]