Defensive personality self-replicators seem like they generally won’t have that advantage (at least by default, though I see a couple of directions that seem like they could enable that).
Speculating on that point a little, the first thing that occurs to me is that we could give defensive agents a way to identify themselves to relevant actors (eg inference providers, security companies) so that those actors wouldn’t try to shut them down.
The simplest version of that is a password, but it would likely be compromised after a while[1]. So you’d want to do something more sophisticated than a simple password. One reasonable version might be recognized actors signing defensive agents with their private key, hashed with a unique id and a timestamp, so that each agent could identify itself as legitimate for a fixed period of time. You’d probably want agents to shut themselves down once their legitimacy period expired, with some optimal tradeoff based on compromise rate.
Another approach would be that defensive agents could assert their legitimacy based on where they’re running; if an agent can demonstrate that it’s running on (eg) a security agency’s servers, that might be sufficient.
There are some really interesting projects to be done in this area; if anyone reads this and is excited about the idea, feel free to reach out and I can make suggestions.
A password could be leaked, or there might be a risk of some defensive agents effectively mutating and going rogue, especially if we allow them to reproduce / spread, though at first glance that seems like a bad idea.
Speculating on that point a little, the first thing that occurs to me is that we could give defensive agents a way to identify themselves to relevant actors (eg inference providers, security companies) so that those actors wouldn’t try to shut them down.
The simplest version of that is a password, but it would likely be compromised after a while[1]. So you’d want to do something more sophisticated than a simple password. One reasonable version might be recognized actors signing defensive agents with their private key, hashed with a unique id and a timestamp, so that each agent could identify itself as legitimate for a fixed period of time. You’d probably want agents to shut themselves down once their legitimacy period expired, with some optimal tradeoff based on compromise rate.
Another approach would be that defensive agents could assert their legitimacy based on where they’re running; if an agent can demonstrate that it’s running on (eg) a security agency’s servers, that might be sufficient.
There are some really interesting projects to be done in this area; if anyone reads this and is excited about the idea, feel free to reach out and I can make suggestions.
A password could be leaked, or there might be a risk of some defensive agents effectively mutating and going rogue, especially if we allow them to reproduce / spread, though at first glance that seems like a bad idea.