It was partially to point out that you can get self-modification hazards with a substantially less complex setup than your proposal with a little hand-engineering of the agents; since none of the AI safety gridworld problems could be said to be rigorously solved, there’s no need for more realistic self-modification environments.
It was partially to point out that you can get self-modification hazards with a substantially less complex setup than your proposal with a little hand-engineering of the agents; since none of the AI safety gridworld problems could be said to be rigorously solved, there’s no need for more realistic self-modification environments.