This lets us adjust their morality until the AIs act sensibly.
The difficult thing isn’t to have the AI act sensibly in the medium term. The difficult thing is to have it’s values stay stable under self modification and to complex problems right like not wireheading everyone right.
This would definitely let you test the values-stable-under-self-modification. Just plonk the AI in an environment where it can self-modify and keep track of its values. Since this is not dependent on morality, you can just give it easily-measurable values.
The difficult thing isn’t to have the AI act sensibly in the medium term. The difficult thing is to have it’s values stay stable under self modification and to complex problems right like not wireheading everyone right.
This would definitely let you test the values-stable-under-self-modification. Just plonk the AI in an environment where it can self-modify and keep track of its values. Since this is not dependent on morality, you can just give it easily-measurable values.