Veedrac comments on Optimality is the tiger, and agents are its teeth

Veedrac 28 Nov 2024 2:54 UTC
2 points
0
Fundamentally, the story was about the failure cases of trying to make capable systems that don’t share your values safe by preventing specific means by which its problem solving capabilities express themselves in scary ways. This is different to what you are getting at here, which is having those systems actually operationally share your values. A well aligned system, in the traditional ‘Friendly AI’ sense of alignment, simply won’t make the choices that the one in the story did.