This looks like off-line training to me. That’s not a problem per se, but it also means that you have an implicit hypothesis that the AGI will be model-based; otherwise, it would have trouble adapting its behavior after getting new information.
I don’t really know what “model-based” means in the context of AGI. Any sufficiently intelligent system will model the world somehow, even if it’s not trained in a way that distinguishes between a “model” and a “policy”. (E.g. humans weren’t.)
On the other hand, the instrumental convergence thesis basically says that for almost all goals, the AGI will have the specific convergent instrumental subgoals. If this is true, then this definitely applies to minds trained through ML, as long as their goals fall into the broad category of the thesis. So this thesis is way more potent for trained minds.
I’ll steal Ben Garfinkel’s response to this. Suppose I said that “almost all possible ways you might put together a car don’t have a steering wheel”. Even if this is true, it tells us very little about what the cars we actually build might look like, because the process of building things picks out a small subset of all possibilities. (Also, note that the instrumental convergence thesis doesn’t say “almost all goals”, just a “wide range” of them. Edit: oops, this was wrong; although the statement of the thesis given by Bostrom doesn’t say that, he says “almost all” in the previous paragrah.)
Thanks for the feedback! Some responses:
I don’t really know what “model-based” means in the context of AGI. Any sufficiently intelligent system will model the world somehow, even if it’s not trained in a way that distinguishes between a “model” and a “policy”. (E.g. humans weren’t.)
I’ll steal Ben Garfinkel’s response to this. Suppose I said that “almost all possible ways you might put together a car don’t have a steering wheel”. Even if this is true, it tells us very little about what the cars we actually build might look like, because the process of building things picks out a small subset of all possibilities. (
Also, note that the instrumental convergence thesis doesn’t say “almost all goals”, just a “wide range” of them.Edit: oops, this was wrong; although the statement of the thesis given by Bostrom doesn’t say that, he says “almost all” in the previous paragrah.)