Wei Dai comments on Distributed vs centralized agents

Wei Dai 13 Feb 2026 17:29 UTC
5 points
0

There are cases where agents share values but don’t trust each other, or get stuck in coordination traps (e.g. citizens living under a dictatorship who all hate the dictator but can’t act as a coherent group to remove him).

No, these citizens don’t share the same values in my sense, because they’re mostly selfish, and would prefer that others risk their lives to remove the dictatorship while they themselves freeride in safety. Two AIs with the same utility function (and prior) over world states or histories would be an example of what I mean.

Conversely, there are also cases where agents don’t share “terminal values” but do share enough “procedural values” that they can basically just trust each other (e.g. honesty boxes in small towns, where you just trust that nobody will steal from you even though they ultimately care about different things from you).

I’m not sure why you need to invent a concept of “shared procedural values” to explain this when game theory says there are plenty of human situations where Cooperate can be a Nash equilibrium. In this case there’s a small chance that stealing could be observed by someone, causing you to lose a huge amount of status/reputation, clearly not worth the benefit of getting some free item, in a small town.

I mean perhaps it has some value in that humans have computational limitations and can’t compute the optimal strategy all the time, so maybe we use cached “procedural values” and this explains some amount of cooperation beyond standard game theory (e.g. why they don’t steal even if everyone else left the town for vacation or something), but it doesn’t seem that useful when we’re talking about large stake future situations...

ETA: Looks like you edited your comment quite a bit from the version I replied to. Don’t have time to check right now whether I also need to edit this reply.
- Richard_Ngo 13 Feb 2026 17:58 UTC
  5 points
  0
  Parent
  No, these citizens don’t share the same values in my sense, because they’re mostly selfish, and would prefer that others risk their lives to remove the dictatorship while they themselves freeride in safety.
  Yes, agreed, I was just gesturing at the closest thing we have to a real-world example. But severe coordination problems can also arise even when agents have exactly the same values—here’s one example. (EDIT: a simpler example from my other reply: “they might not be certain that the other agent has the same values as they do, and therefore all else equal would prefer to have resources themselves rather than giving them to the other agent”.) And based on this I claim that a dictatorship could in principle survive (at least for a while) even when every citizen actually shared exactly the same selfless anti-dictator values, as long as the common knowledge equilibrium were bad enough.
  maybe we use cached “procedural values” and this explains some amount of cooperation beyond standard game theory
  Yes. Except that our attitudes towards this differ: you seem to think of it as an edge case, whereas I think of it as an anomaly which is pointing us towards ways in which game theory is the wrong framework to be using. Specifically I think that game theory makes it hard to study group agents, because you assume that each agent’s strategy is derived from its terminal values. However, in practice the way that group agents work is by programming heuristics/ethics/procedural values into their subagents. And so “cached” is an inaccurate description: it’s not that the individual derives those values rationally and then stores them for later, but rather that they’ve been “programmed” with them (e.g. via reinforcement learning).
  (Of course, ideally the study of group/distributed agents and the study of individual/centralized agents will converge, but I think the best way to do that is to separately explore both perspectives.)
  ETA: Looks like you edited your comment quite a bit from the version I replied to.
  Sorry about this!