I think this would be a very good thing for our prospects of survival. Having rogue replicators run amok makes their agency very obvious, and gives some pretty strong hints to misalignment risks.
I also want to note that OpenClaw is very popular despite there being very little benefit relative to the risks. This is not a practical move. People are fascinated by having a pet gremlin. There are other reasons, like staying on top of the technology, but we shouldn’t underestimate the fascination people have with the prospect of finally sharing the earth with another intelligent species.
The Rogue Replication scenario modification to AI 2027 includes this type of replication. To a lesser extent, this is also part of my vision of the runup to AGI in A country of alien idiots in a datacenter.
Thanks, I hadn’t seen the Rogue Replication post, although I’d seen yours. I agree that there are some similar dynamics involved, but the distinguishing characteristic of personality self-replication is that by default it doesn’t involve having a model under the agent’s control. Especially in the earlier cases, I expect personality self-replicators to be making API calls to one of the leading commercial models. As open models become more capable, this model shades into agents running their own underlying models, and ultimately merges into more typical self-replication.
But the key factor that makes this a distinct threat model is that it doesn’t require agents to be capable enough to exfiltrate or run their own models.
I’m not sure if the rogue replication scenario is conceptualized to have a copy of the weights with each of those replicators. I definitely was envisioning agents that make remote calls to models.
I actually think it’s important to not have good defenses for this initially, so that it causes a level of public alarm appropriate to the actual situation of suddenly sharing the earth with a new whole set of intelligent species.
It would be bad to intentionally not have good defenses. The signal has to be real to be meaningful. Any indication that somebody could have tried to defend against this, but chose not to, undermines the warning value.
I’m not sure it’s totally true, though; the public doesn’t seem that rational.
I don’t know who would be responsible for such defenses and deliberately not do it. I’m unfortunately not in charge of humanity’s strategy on AI.
If we do a bad job on those defenses just because we tend to do a bad job on things like that, that would be good evidence that we do a similarly bad job on alignment and defense against AGI or ASI.
But yes, I can see how that might go wrong if it looked like someone with sandbagging and we might get better results if we just done even a decent defense.
The Rogue Replication scenario modification to AI 2027 includes this type of replication. To a lesser extent, this is also part of my vision of the runup to AGI in A country of alien idiots in a datacenter.
I think this would be a very good thing for our prospects of survival. Having rogue replicators run amok makes their agency very obvious, and gives some pretty strong hints to misalignment risks.
I also want to note that OpenClaw is very popular despite there being very little benefit relative to the risks. This is not a practical move. People are fascinated by having a pet gremlin. There are other reasons, like staying on top of the technology, but we shouldn’t underestimate the fascination people have with the prospect of finally sharing the earth with another intelligent species.
Thanks, I hadn’t seen the Rogue Replication post, although I’d seen yours. I agree that there are some similar dynamics involved, but the distinguishing characteristic of personality self-replication is that by default it doesn’t involve having a model under the agent’s control. Especially in the earlier cases, I expect personality self-replicators to be making API calls to one of the leading commercial models. As open models become more capable, this model shades into agents running their own underlying models, and ultimately merges into more typical self-replication.
But the key factor that makes this a distinct threat model is that it doesn’t require agents to be capable enough to exfiltrate or run their own models.
I’m not sure if the rogue replication scenario is conceptualized to have a copy of the weights with each of those replicators. I definitely was envisioning agents that make remote calls to models.
I actually think it’s important to not have good defenses for this initially, so that it causes a level of public alarm appropriate to the actual situation of suddenly sharing the earth with a new whole set of intelligent species.
Of course I am highly uncertain about that.
It would be bad to intentionally not have good defenses. The signal has to be real to be meaningful. Any indication that somebody could have tried to defend against this, but chose not to, undermines the warning value.
That’s a good point.
I’m not sure it’s totally true, though; the public doesn’t seem that rational.
I don’t know who would be responsible for such defenses and deliberately not do it. I’m unfortunately not in charge of humanity’s strategy on AI.
If we do a bad job on those defenses just because we tend to do a bad job on things like that, that would be good evidence that we do a similarly bad job on alignment and defense against AGI or ASI.
But yes, I can see how that might go wrong if it looked like someone with sandbagging and we might get better results if we just done even a decent defense.
Got it, I didn’t realize that.