Canaletto comments on Plans A, B, C, and D for misalignment risk

Canaletto 13 Oct 2025 15:24 UTC
1 point
0
How about more uhh soft uncontrollability? Like, not “it subverted our whole compute and feeds us lies” but more “we train it to do A, which it sees as only telling it to do A, and does A, but its motivations are completely untouched”.