Evan R. Murphy comments on Why Instrumental Goals are not a big AI Safety Problem

Evan R. Murphy 9 Apr 2022 22:52 UTC
1 point
0
Hmm well if A is being trained the same way using deep learning toward being an agentic system, then it is subject to mesa-optimization and having goals, isn’t it? And being subject to mesa-optimization, do you have a way to address inner misalignment failures like deceptive alignment? Oversight alone can be thwarted by a deceptively-aligned mesa-optimizer.
You might possibly address this if you give the overseer good enough transparency tools. But such tools don’t exist yet.