Yair Halberstadt comments on Stop button: towards a causal solution

Yair Halberstadt 19 Nov 2021 4:47 UTC
1 point
0
I’m sceptical of any approach to alignment that involves finding a perfect ungameable utility function.

Even if you could find one, and even if you could encode it accurately when training the AI, that only effects outer alignment.

What really matters for AI safety is inner alignment. And that’s very unlikely to pick up all the subtle nuances of a complex utility function.