[Question] In the Short-Term, Why Couldn’t You Just RLHF-out Instrumental Convergence?

To the question “Do you expect instrumental convergence to become a big pain for AGI labs within the next two years?”, about a quarter of the 400 people who answered to the poll I ran said “Yes”.

I would like to hear people’s thoughts on how they could see this happening within 2 years and, in particular, the most important reasons why labs couldn’t just erase problematic instrumental convergence with RLHF/​Constitutional AI or similar?

Toy concrete scenarios would be most helpful.

You can find the Twitter version of this question here.

No comments.