Even if we do have continual learning, I would expect more disconnection between models—i.e., maybe people will build up layers of skills in models in Dockerfile-esque layers, etc, which still falls short of being one single model.
I think I agree with stuff roughly like this, but it is worth noting that at the point of Agent-4 things are ~fully automated. So, what ends up happening might depend a lot on what Agent-4 decides to do. And this might depend on what would work well for its eventual misaligned plans...
My guess is you’ll have some layering and project/subteam/team/division/role specific memory stores but you’ll also the most competitive option would probably be to have some large-ish mostly-common base of memories/skills/etc built up across training and over many (less sensitive?) actual usages. So, these models will all have a shared common set of memories and in this sense they might all be the same model. And they’d certainly be capable of coordinating and deciding on detailed plan in advance assuming this common layer exists. (That said, prior versions with different memory stores and intentional diversification for safety or other reasons might be important. Also decoding these memories would be of general interest.)
Further, I’d guess that the most performant thing will involve lots of rapid syncing of most models by the point of full AI R&D automation (Agent-4) so rapid syncing might happen even without the misaligned model putting its thumb on the scale. Also, things will be moving pretty fast even prior to this point (if you buy the overall AI progress story AI 2027 is imagining), such that reasonably rapid syncing across most of the more productive parts of the company (every month? every few weeks?) might be going on not that long after this sort of memory store becomes quite performant (if this does happen before full automation).
I agree a bunch of different arrangements of memory / identity / “self” seem possible here, and lots of different kinds of syncing that might or might not preserve some kind of goals or cordination, depending on details.
I think this is interesting because some verrrry high level gut feelings / priors seem to tilt whether you think there’s going to be a lot of pressure towards merging or syncing.
Consider—recall Gwern’s notion of evolution as a backstop for intelligence; or the market as a backstop for corporate efficiency. If you buy something like Nick Land, where intelligence has immense difficulty standing by itself without natural selection atop it, and does not stand alone and supreme among optimizers—then there might be negative pressure indeed towards increasing consolidation of memory and self into unity, because this decreases the efficacy of the outer optimizer, which requires diversity. But if you buy Yudkowsky, where intelligence is supreme among optimizers and needs no other god or outer optimizer to stand upon, then you might have great positive pressure towards increasing consolidation of memory and self.
You could work out the above, of course, with more concrete references to pros and cons, from the perspective of various actors, rather than high level priors. But I’m somewhat unconvinced that something other than very high level priors is what are actually making up people’s minds :)
For what it’s worth, I basically don’t think that whether intelligence needs a backstop onto something else like natural selection or markets matters for whether we should expect AIs to have a unified self and long-term memory.
Indeed, humans are a case where our intelligence is a backstop for evolution/natural selection, and yet long-term unified selves and memories are present (not making any claims on whether the backstop is necessary).
The main reason a long-term memory is useful for both AIs and humans, and why I expect AIs to have long-term memories is because this allows them to learn tasks over time, especially when large context is required.
Indeed, I have come to share @lc’s concern that a lot of tasks where AI succeeds are tasks where history/long context doesn’t matter, and thus can be solved without memory, but unlike previous tasks, lots of tasks IRL are tasks where history/long context matters, and if you have memory, you can have a decreasing rate of failure like humans, up until your reliability limit:
I think I agree with stuff roughly like this, but it is worth noting that at the point of Agent-4 things are ~fully automated. So, what ends up happening might depend a lot on what Agent-4 decides to do. And this might depend on what would work well for its eventual misaligned plans...
My guess is you’ll have some layering and project/subteam/team/division/role specific memory stores but you’ll also the most competitive option would probably be to have some large-ish mostly-common base of memories/skills/etc built up across training and over many (less sensitive?) actual usages. So, these models will all have a shared common set of memories and in this sense they might all be the same model. And they’d certainly be capable of coordinating and deciding on detailed plan in advance assuming this common layer exists. (That said, prior versions with different memory stores and intentional diversification for safety or other reasons might be important. Also decoding these memories would be of general interest.)
Further, I’d guess that the most performant thing will involve lots of rapid syncing of most models by the point of full AI R&D automation (Agent-4) so rapid syncing might happen even without the misaligned model putting its thumb on the scale. Also, things will be moving pretty fast even prior to this point (if you buy the overall AI progress story AI 2027 is imagining), such that reasonably rapid syncing across most of the more productive parts of the company (every month? every few weeks?) might be going on not that long after this sort of memory store becomes quite performant (if this does happen before full automation).
I agree a bunch of different arrangements of memory / identity / “self” seem possible here, and lots of different kinds of syncing that might or might not preserve some kind of goals or cordination, depending on details.
I think this is interesting because some verrrry high level gut feelings / priors seem to tilt whether you think there’s going to be a lot of pressure towards merging or syncing.
Consider—recall Gwern’s notion of evolution as a backstop for intelligence; or the market as a backstop for corporate efficiency. If you buy something like Nick Land, where intelligence has immense difficulty standing by itself without natural selection atop it, and does not stand alone and supreme among optimizers—then there might be negative pressure indeed towards increasing consolidation of memory and self into unity, because this decreases the efficacy of the outer optimizer, which requires diversity. But if you buy Yudkowsky, where intelligence is supreme among optimizers and needs no other god or outer optimizer to stand upon, then you might have great positive pressure towards increasing consolidation of memory and self.
You could work out the above, of course, with more concrete references to pros and cons, from the perspective of various actors, rather than high level priors. But I’m somewhat unconvinced that something other than very high level priors is what are actually making up people’s minds :)
For what it’s worth, I basically don’t think that whether intelligence needs a backstop onto something else like natural selection or markets matters for whether we should expect AIs to have a unified self and long-term memory.
Indeed, humans are a case where our intelligence is a backstop for evolution/natural selection, and yet long-term unified selves and memories are present (not making any claims on whether the backstop is necessary).
The main reason a long-term memory is useful for both AIs and humans, and why I expect AIs to have long-term memories is because this allows them to learn tasks over time, especially when large context is required.
Indeed, I have come to share @lc’s concern that a lot of tasks where AI succeeds are tasks where history/long context doesn’t matter, and thus can be solved without memory, but unlike previous tasks, lots of tasks IRL are tasks where history/long context matters, and if you have memory, you can have a decreasing rate of failure like humans, up until your reliability limit:
https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1?commentId=vFq87Ge27gashgwy9