In the context of natural impact regularization, it would be interesting to try to explore some @TurnTrout-style powerseeking theorems for subagents. (Yes, I know he denounces the powerseeking theorems, but I still like them.)
Specifically, consider this setup: Agent U starts a number of subagents S1, S2, S3, …, with the subagents being picked according to U’s utility function (or decision algorithm or whatever). Now, would S1 seek power? My intuition says, often not! If S1 seeks power in a way that takes away power from S2, that could disadvantage U. So basically S1 would only seek power in cases where it expects to make better use of the power than S2, S3, ….
Obviously this may be kind of hard for us to make use of if we are trying to make an AI and we only know how to make dangerous utility maximizers. But if we’re happy with the kind of maximizers we can make on the first order (as seems to apply to the SOTA, since current methods aren’t really utility maximizers) and mainly worried about the mesaoptimizers they might make, this sort of theorem would suggest that the mesaoptimizers would prefer staying nice and bounded.
In the context of natural impact regularization, it would be interesting to try to explore some @TurnTrout-style powerseeking theorems for subagents. (Yes, I know he denounces the powerseeking theorems, but I still like them.)
Specifically, consider this setup: Agent U starts a number of subagents S1, S2, S3, …, with the subagents being picked according to U’s utility function (or decision algorithm or whatever). Now, would S1 seek power? My intuition says, often not! If S1 seeks power in a way that takes away power from S2, that could disadvantage U. So basically S1 would only seek power in cases where it expects to make better use of the power than S2, S3, ….
Obviously this may be kind of hard for us to make use of if we are trying to make an AI and we only know how to make dangerous utility maximizers. But if we’re happy with the kind of maximizers we can make on the first order (as seems to apply to the SOTA, since current methods aren’t really utility maximizers) and mainly worried about the mesaoptimizers they might make, this sort of theorem would suggest that the mesaoptimizers would prefer staying nice and bounded.