Solid comment, thanks. I agree with nearly all of this. I chose to highlight the question of whether Claude et al are really aligned because it feels like an important prerequisite to the next couple posts, forthcoming. I think “incoherent enough to be safe but coherent enough to do alignment research” seems like a very unstable and unlikely state.
Solid comment, thanks. I agree with nearly all of this. I chose to highlight the question of whether Claude et al are really aligned because it feels like an important prerequisite to the next couple posts, forthcoming. I think “incoherent enough to be safe but coherent enough to do alignment research” seems like a very unstable and unlikely state.