[I only skimmed this, sorry, but I figure you’d rather have some crappy feedback than none]
How does this relate to speed prior and stuff like that?
If the agent figures out how to build another agent that is basically a copy of itself but without the efficient thinking constraints… won’t it do so? Because the desire of thinking efficiently is only a desire for it to think efficiently, not for its children to think efficiently… And if we want to construct it in such a way that it desires for it and all the agents it creates or helps create to think efficiently… that seems hard, and also maybe like it’ll limit capabilities. But doable and maybe safe I guess.
1. How does this relate to speed prior and stuff like that?
I list this in the concluding section as something I haven’t thought about much but would think about more if I spent more time on it.
2. If the agent figures out how to build another agent...
Yes, tackling these kinds of issues is the point of this post. I think efficient thinking measures would be very difficult / impossible to actually specify well, and I use compute usage as an example of a crappy efficient thinking measure. The point is that even if the measure is crap, it might still be able to induce some degree of mild optimisation, and this mild optimisation could help protect the measure (alongside the rest of the specification) from the kind of gaming behaviour you describe. In the ‘Potential for Self-Protection Against Gaming’ section, I go through how this works when an agent with a crap efficient thinking measure has the option to perform a ‘gaming’ action such as delegating or making a successor agent.
[I only skimmed this, sorry, but I figure you’d rather have some crappy feedback than none]
How does this relate to speed prior and stuff like that?
If the agent figures out how to build another agent that is basically a copy of itself but without the efficient thinking constraints… won’t it do so? Because the desire of thinking efficiently is only a desire for it to think efficiently, not for its children to think efficiently… And if we want to construct it in such a way that it desires for it and all the agents it creates or helps create to think efficiently… that seems hard, and also maybe like it’ll limit capabilities. But doable and maybe safe I guess.
I list this in the concluding section as something I haven’t thought about much but would think about more if I spent more time on it.
Yes, tackling these kinds of issues is the point of this post. I think efficient thinking measures would be very difficult / impossible to actually specify well, and I use compute usage as an example of a crappy efficient thinking measure. The point is that even if the measure is crap, it might still be able to induce some degree of mild optimisation, and this mild optimisation could help protect the measure (alongside the rest of the specification) from the kind of gaming behaviour you describe. In the ‘Potential for Self-Protection Against Gaming’ section, I go through how this works when an agent with a crap efficient thinking measure has the option to perform a ‘gaming’ action such as delegating or making a successor agent.