Self-improvement without self-modification
This is just a short note to point out that AIs can self-improve without having to self-modify. So locking down an agent from self-modification is not an effective safety measure.
How could AIs do that? The easiest and the most trivial is to create a subagent, and transfer their resources and abilities to it (“create a subagent” is a generic way to get around most restriction ideas).
Or it the AI remains unchanged and in charge, it could change the whole process around itself, so that the whole process changes and improves. For instance, if the AI is inconsistent and has to pay more attention to problems that are brought to its attention than problems that aren’t, it can start to act to manage the news (or the news-bearers) to hear more of what it wants. If it can’t experiment on humans, it will give advice that will cause more “natural experiments”, and so on. It will gradually try to reform its environment to get around its programmed limitations.
Anyway, that was nothing new or deep, just a reminder point I hadn’t seen written out.