Like I mentioned above, I mostly think of narrow value learning is a substitute for imitation learning or approval-direction, realistically to be used as a distillation step rather than as your whole AI. In particular, an agent trained with narrow value learning absolutely is probably not aligned+competitive in a way that might allow you to apply this kind of strategy-stealing argument.
In concrete approval-directed agents I’m talking about a different design, it’s not related to narrow value learning.
I don’t use narrow and short-term interchangeably. I’ve only ever used it in the context of value learning, in order to make this particular distinction between two different goals you might have when doing value learning.
Ah, that clears up a lot of things for me. (I saw your earlier comment but was quite confused by it due to not realizing your narrow / short-term distinction.) One reason I thought you used “short-term” and “narrow” interchangeably is due to Act-based agents where you seemed to be doing that:
These proposals all focus on the short-term instrumental preferences of their users. [...]
What is “narrow” anyway?
There is clearly a difference between act-based agents and traditional rational agents. But it’s not entirely clear what the key difference is.
And in that post it also seemed like “narrow value learners” were meant to be the whole AI since it talked a lot about “users” of such AI.
(In that post I did use narrow in the way we are currently using short-term, contrary to my claim the grandparent. Sorry for the confusion this caused.)
Like I mentioned above, I mostly think of narrow value learning is a substitute for imitation learning or approval-direction, realistically to be used as a distillation step rather than as your whole AI. In particular, an agent trained with narrow value learning absolutely is probably not aligned+competitive in a way that might allow you to apply this kind of strategy-stealing argument.
In concrete approval-directed agents I’m talking about a different design, it’s not related to narrow value learning.
I don’t use narrow and short-term interchangeably. I’ve only ever used it in the context of value learning, in order to make this particular distinction between two different goals you might have when doing value learning.
Ah, that clears up a lot of things for me. (I saw your earlier comment but was quite confused by it due to not realizing your narrow / short-term distinction.) One reason I thought you used “short-term” and “narrow” interchangeably is due to Act-based agents where you seemed to be doing that:
And in that post it also seemed like “narrow value learners” were meant to be the whole AI since it talked a lot about “users” of such AI.
(In that post I did use narrow in the way we are currently using short-term, contrary to my claim the grandparent. Sorry for the confusion this caused.)