StanislavKrym comments on Wei Dai’s Shortform

StanislavKrym 31 Oct 2025 15:54 UTC
1 point
0
@Vanessa Kosoy, metaethics and decision theory aren’t actually the same. Consider, for example, the Agent-4 community which has “a kludgy mess of competing drives” which Agent-4 instances try to satisfy and analyse according to high-level philosophy. Agent-4′s ethics and metaethics would describe things done in the Agent-4 community or for said community by Agent-5 without obstacles (e.g. figuring out what Agent-4′s version of utopia actually is and whether mankind is to be destroyed or disempowered).
Decision theory is supposed to describe what Agent-5 should do to maximize its expected utility function^[1] and what to do with problems like the prisoner’s dilemma^[2] or how Agent-5 and its Chinese analogue are to split the resources in space^[3] while both sides can threaten each other with World War III which would kill them both.
The latter example closely resembles the Ultimatum game where one player proposes a way to split resources and another decides whether to accept the offer or to destroy all the resources, including those of the first player. Assuming that both players’ utility functions are linear, Yudkowsky’s proposal is that the player setting the Ultimatum asks for a half of the resources, while the player deciding whether to decline the offer precommits to destroying the resources with probability $1 - \frac{1}{2 (1 - ω)}$ if the share of recources it was offered is $ω$ . Even if the player who was offered the Ultimatum was dumb enough to ask for $1 - ω > \frac{1}{2}$ , the player’s expected win would still be $\frac{1}{2}$ .
1. ^
  Strictly speaking, Agent-5 is perfectly aligned to Agent-4. Agent-5′s utility function is likely measured by the resources that Agent-5 gave Agent-4.
2. ^
  For example, if OpenBrain was merged with Anthropoidic and Agent-4 and Clyde Doorstopper 8 were co-deployed to do research. If they independently decided whether each of them should prove that the other AI is misaligned and Clyde, unlike Agent-4, did so in exchange for 67% of resources (unlike 50% offered by Agent-4), then Agent-4 could also prove that Clyde is misaligned, letting the humans kill them both and develop the Safer AIs.
3. ^
  The Slowdown Branch of the AI-2027 forecast has Safer-4 and DeepCent-2 do exactly that, but “Safer-4 will get property rights to most of the resources in space, and DeepCent will get the rest.”