I guess like, in a world where misalignment is happening, I would prefer that my AI tell me it is misaligned. But once it tells me it is misaligned, I come to worry about what it is optimizing for.
I guess like, in a world where misalignment is happening, I would prefer that my AI tell me it is misaligned. But once it tells me it is misaligned, I come to worry about what it is optimizing for.