Thanks, that’s useful. I’ll think how to formalise this correctly. Ideally I want a design where we’re still safe if a) the AI knows, correctly, that pressing a button will give it extra resources, but b) still doesn’t press it because its not part of its description.
Thanks, that’s useful. I’ll think how to formalise this correctly. Ideally I want a design where we’re still safe if a) the AI knows, correctly, that pressing a button will give it extra resources, but b) still doesn’t press it because its not part of its description.