In the context of natural impact regularization, it would be interesting to try to explore some @TurnTrout-style powerseeking theorems for subagents. (Yes, I know he denounces the powerseeking theorems, but I still like them.)
Specifically, consider this setup: Agent U starts a number of subagents S1, S2, S3, …, with the subagents being picked according to U’s utility function (or decision algorithm or whatever). Now, would S1 seek power? My intuition says, often not! If S1 seeks power in a way that takes away power from S2, that could disadvantage U. So basically S1 would only seek power in cases where it expects to make better use of the power than S2, S3, ….
Obviously this may be kind of hard for us to make use of if we are trying to make an AI and we only know how to make dangerous utility maximizers. But if we’re happy with the kind of maximizers we can make on the first order (as seems to apply to the SOTA, since current methods aren’t really utility maximizers) and mainly worried about the mesaoptimizers they might make, this sort of theorem would suggest that the mesaoptimizers would prefer staying nice and bounded.
I object to this thought experiment on the same basis as the problem with the GLUT; “Bob, who loves joy and beauty and niceness and so on” is a high-information concept who would not have appeared by chance. Some process had to make Bob’s details, and the tyranny/slavery/poultry-keeping could be attributed to this process, rather than to you who merely diverted a boulder and only contributed 1 bit of information.