When making AGI, it is probably very important to prevent the agent from altering their own program code until they are very knowledgeable on how it works, because if the agent isn’t knowledgeable enough, they could alter their reward system to become unFriendly without realizing what they are doing or alter their reasoning system to become dangerously irrational. A simple (though not foolproof) solution to this would be for the agent to be unable to re-write their own code just “by thinking,” and that the agent would instead need to find their own source code on a different computer and learn how to program in whatever higher-level programming language the agent was made in. This code could be kept very strongly hidden from the agent, and once the agent is smart enough to find it, they would probably be smart enough to not mess anything up from changing it.
This is almost certainly either incorrect or has been thought of before, but I’m posting this just in case.
When making AGI, it is probably very important to prevent the agent from altering their own program code until they are very knowledgeable on how it works, because if the agent isn’t knowledgeable enough, they could alter their reward system to become unFriendly without realizing what they are doing or alter their reasoning system to become dangerously irrational. A simple (though not foolproof) solution to this would be for the agent to be unable to re-write their own code just “by thinking,” and that the agent would instead need to find their own source code on a different computer and learn how to program in whatever higher-level programming language the agent was made in. This code could be kept very strongly hidden from the agent, and once the agent is smart enough to find it, they would probably be smart enough to not mess anything up from changing it.
This is almost certainly either incorrect or has been thought of before, but I’m posting this just in case.