(the people who think ASI won’t be a long-range agent are missing key bits)
also, relatedly, might you be interested in helping design or maybe even write material for a course which is meant to give people a strong basis in the technical challenges of alignment? current version https://agentfoundations.study/
Generally very solid distillation, you seem to get the problem unusually well!
suggest checking these references:
https://www.lesswrong.com/posts/DJnvFsZ2maKxPi7v7/what-s-up-with-confusingly-pervasive-goal-directedness
https://www.lesswrong.com/posts/GY49CKBkEs3bEpteM/parametrically-retargetable-decision-makers-tend-to-seek
https://www.lesswrong.com/posts/LWfYjZgXHN5GYtYpH/intrinsic-power-seeking-ai-might-seek-power-for-power-s-sake
and maybe some of
The Parable of Predict-O-Matic, Why Tool AIs Want to Be Agent AIs, Averting the convergent instrumental strategy of self-improvement, Averting instrumental pressures, and other articles under arbital corrigibility.
(the people who think ASI won’t be a long-range agent are missing key bits)
also, relatedly, might you be interested in helping design or maybe even write material for a course which is meant to give people a strong basis in the technical challenges of alignment? current version https://agentfoundations.study/