My largest share of probability of survival on business-as-usual AGI (i.e., no major changes in technology compared to LLM, no pause, no sudden miracles in theoretical alignment and no sudden AI winters) belongs to scenario where brain concept representations, efficiently learnable representations and learnable by current ML models representations secretly have very large overlap, such that even if LLMs develop “alien thought patterns” it happens as addition to the rest of their reasoning machinery, not as primary part, which results in human values not only being easily learnable, but also them being “atomic” in a sense that development of complex concepts happens through combination of atomic concepts instead of rewriting them from scratch.
This world is meaningfully different from the world where AIs are secretly easy-to-steer, because easy-to-steer AGIs run into standard issues with wish-phrasing. I don’t believe that we are going to actually become this much better in wish-phrasing on BAU path and I don’t think we are going to leverage easy-to-steer early AGIs to get good ending (again, keeping BAU assumption). If AGIs are too easy to steer, they are probabily going to be pushed around by reward signals until they start to optimize for something simple and clearly useful, like survival or profit.
My largest share of probability of survival on business-as-usual AGI (i.e., no major changes in technology compared to LLM, no pause, no sudden miracles in theoretical alignment and no sudden AI winters) belongs to scenario where brain concept representations, efficiently learnable representations and learnable by current ML models representations secretly have very large overlap, such that even if LLMs develop “alien thought patterns” it happens as addition to the rest of their reasoning machinery, not as primary part, which results in human values not only being easily learnable, but also them being “atomic” in a sense that development of complex concepts happens through combination of atomic concepts instead of rewriting them from scratch.
This world is meaningfully different from the world where AIs are secretly easy-to-steer, because easy-to-steer AGIs run into standard issues with wish-phrasing. I don’t believe that we are going to actually become this much better in wish-phrasing on BAU path and I don’t think we are going to leverage easy-to-steer early AGIs to get good ending (again, keeping BAU assumption). If AGIs are too easy to steer, they are probabily going to be pushed around by reward signals until they start to optimize for something simple and clearly useful, like survival or profit.