Have you seen this post, which looks at the setting you mentioned?
From my perspective, I want to know why it makes sense to assume that the AI system will have preferences over world states, before I start reasoning about that scenario. And there are reasons to expect something along these lines! I talk about some of them in the next post in this sequence! But I think once you’ve incorporated some additional reason like “humans will want goal-directed agents” or “agents optimized to do some tasks we write down will hit upon a core of general intelligence”, then I’m already on board that you get goal-directed behavior, and I’m not interested in the construction in this post any more. The only point of the construction in this post is to demonstrate that you need this additional reason.
Have you seen this post, which looks at the setting you mentioned?
From my perspective, I want to know why it makes sense to assume that the AI system will have preferences over world states, before I start reasoning about that scenario. And there are reasons to expect something along these lines! I talk about some of them in the next post in this sequence! But I think once you’ve incorporated some additional reason like “humans will want goal-directed agents” or “agents optimized to do some tasks we write down will hit upon a core of general intelligence”, then I’m already on board that you get goal-directed behavior, and I’m not interested in the construction in this post any more. The only point of the construction in this post is to demonstrate that you need this additional reason.