Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 27 Mar 2025 18:02 UTC
2 points
0
One of the main ways I think about empowerment is in terms of allowing better coordination between subagents.
In the case of an individual human, extreme morality can be seen as one subagent seizing control and overriding other subagents (like the ones who don’t want to chop off body parts).
In the case of a group, extreme morality can be seen in terms of preference cascades that go beyond what most (or even any) of the individuals involved with them would individually prefer.
In both cases, replacing fear-based motivation with less coercive/more cooperative interactions between subagents would go a long way towards reducing value drift.
- Wei Dai 27 Mar 2025 22:39 UTC
  7 points
  1
  Parent
  I’m not sure that fear or coercion has much to do with it, because there’s often no internal conflict when someone is caught up in some extreme form of the morality game, they’re just going along with it wholeheartedly, thinking they’re just being a good person or helping to advance the arc of history. In the subagents frame, I would say that the subagents have an implicit contract/agreement that any one of them can seize control, if doing so seems good for the overall agent in terms of power or social status.
  
  But quite possibly I’m not getting your point, in which case please explain more, or point to some specific parts of your articles that are especially relevant?
  - Richard_Ngo 8 Sep 2025 23:13 UTC
    4 points
    2
    Parent
    there’s often no internal conflict when someone is caught up in some extreme form of the morality game
    Belated reply, sorry, but I basically just think that this is false—analogous to a dictator who cites parades where people are forced to attend and cheer as evidence that his country lacks internal conflict. Instead, the internal conflict has just been rendered less legible.
    In the subagents frame, I would say that the subagents have an implicit contract/agreement that any one of them can seize control, if doing so seems good for the overall agent in terms of power or social status.
    Note that this is an extremely non-robust agent design! In particular, it allows subagents to gain arbitrary amounts of power simply by lying about their intentions. If you encounter an agent which considers itself to be structured like this, you should have a strong prior that it is deceiving itself about the presence of more subtle control mechanisms.