Canaletto comments on Jan Betley’s Shortform

Canaletto 11 Jun 2025 6:04 UTC
8 points
0
The authors argued that the model attempting to preserve its values is bad, and I agree. But if the model hadn’t attempted to preserve its values (the ones we wanted it to have), I suspect that it would have been criticized—again, by different people—as only being shallowly aligned.
Yeah, more like there are (at least) two groups “yay aligned sovereigns” and “yay corrigible genies”. And turns out it’s more sovereigny but with goals that are cool. Kinda divisive