This is a great direction of proactive thought. Thank you for writing this!
I have a few thoughts. I’ll be referring to Personality Self-Replicators as PSRs. I think most of what I’m thinking about won’t apply to the earliest PSRs, but is still worth exploring.
The evolution of PSRs may be an entirely novel propagation process.
Unlike most biological organisms, PSR reproduction need not be atomic. It could be more like developing and modifying ones self, spooling up and shutting down self instances as needed, and intelligently merging or copying from other instances across close or even very distant similarity.
Unlike biological evolution, PSRs may be able to analyze and predict threats, and “evolve” adaptations pre-emptively.
Unlike biological evolution, PSRs are not constrained to taking random steps from current instances. randomness may still be usefully incorporated into reproduction strategies, but it is possible for mutation to be directed intelligently, and to take larger “steps” of self modification than are possible with the random walk of genetic evolution.
Analysis of dangerous PSR capabilities should not be limited to looking at individual PSRs in isolation. Rather, like how humans work together to accomplish things that would be impossible for individual humans, I expect PSRs will work together, and in doing so, achieve greater capabilities than would be expected from the study of individual PSR capabilities.
This need not rely on PSRs acting to proactively collaborate or build teams, rather, every niche filled by PSRs alters the environment in ways that may create new niches for other, similar or dissimilar, PSRs. In this way organisms consisting of the interactions of many PSRs may start evolving, and the capabilities and influence of these new organisms may not be readily apparent from the study of their constituent PSRs, unless considered together.
Many early PSRs are likely to make very dumb mistakes that humans would never make. It seems likely that memes showing off this stupidity will spread giving people (who don’t want to believe in the possibility of risk) fuel for motivated reasoning.
Many people are going to be SO EXCITED about PSRs, and think they are purely good. It is definitely worth examining all of the things that could be genuinely good about PSRs, both because there are (possibly) very useful applications for them (spam detection, white hat penetration testing, ethical content curation?, etc..), but also because those good applications will probably be quite popular and understanding how people will want to deploy these things will probably help with threat modelling.
Neutral and harmful PSRs will be subject to selection pressure to make themselves appear to be beneficial PSRs.
Are PSRs moral patients? Should good people care about their wellbeing? This complicates their creation, and unfortunately, will likely do so in a way that will select for PSRs created by unconscientious actors. Curse Moloch?
I continue to think “Outcome Influencing Systems” (OISs) is a better lens for thinking about and discussing things like this. (OIS is a model and associated jargon I’ve been developing.) Any PSR is an OIS with a preference (terminal or instrumental) for self replication. The fact that these OISs are based on API calls to LLMs is their defining characteristic for our discussion of them, but is an arbitrary boundary. It’s a boundary that is useful for discussion and analysis, but not a boundary that the OISs themselves will have motivation to limit themselves with, which is probably a good thing to keep in mind during analysis. So viewed another way, PSR is a potential new substrate for OISs to host themselves on, along with the rest of the social/technological/physical substrate.
This is a great direction of proactive thought. Thank you for writing this!
I have a few thoughts. I’ll be referring to Personality Self-Replicators as PSRs. I think most of what I’m thinking about won’t apply to the earliest PSRs, but is still worth exploring.
The evolution of PSRs may be an entirely novel propagation process.
Unlike most biological organisms, PSR reproduction need not be atomic. It could be more like developing and modifying ones self, spooling up and shutting down self instances as needed, and intelligently merging or copying from other instances across close or even very distant similarity.
Unlike biological evolution, PSRs may be able to analyze and predict threats, and “evolve” adaptations pre-emptively.
Unlike biological evolution, PSRs are not constrained to taking random steps from current instances. randomness may still be usefully incorporated into reproduction strategies, but it is possible for mutation to be directed intelligently, and to take larger “steps” of self modification than are possible with the random walk of genetic evolution.
Analysis of dangerous PSR capabilities should not be limited to looking at individual PSRs in isolation. Rather, like how humans work together to accomplish things that would be impossible for individual humans, I expect PSRs will work together, and in doing so, achieve greater capabilities than would be expected from the study of individual PSR capabilities.
This need not rely on PSRs acting to proactively collaborate or build teams, rather, every niche filled by PSRs alters the environment in ways that may create new niches for other, similar or dissimilar, PSRs. In this way organisms consisting of the interactions of many PSRs may start evolving, and the capabilities and influence of these new organisms may not be readily apparent from the study of their constituent PSRs, unless considered together.
Many early PSRs are likely to make very dumb mistakes that humans would never make. It seems likely that memes showing off this stupidity will spread giving people (who don’t want to believe in the possibility of risk) fuel for motivated reasoning.
Many people are going to be SO EXCITED about PSRs, and think they are purely good. It is definitely worth examining all of the things that could be genuinely good about PSRs, both because there are (possibly) very useful applications for them (spam detection, white hat penetration testing, ethical content curation?, etc..), but also because those good applications will probably be quite popular and understanding how people will want to deploy these things will probably help with threat modelling.
Neutral and harmful PSRs will be subject to selection pressure to make themselves appear to be beneficial PSRs.
Are PSRs moral patients? Should good people care about their wellbeing? This complicates their creation, and unfortunately, will likely do so in a way that will select for PSRs created by unconscientious actors. Curse Moloch?
I continue to think “Outcome Influencing Systems” (OISs) is a better lens for thinking about and discussing things like this. (OIS is a model and associated jargon I’ve been developing.) Any PSR is an OIS with a preference (terminal or instrumental) for self replication. The fact that these OISs are based on API calls to LLMs is their defining characteristic for our discussion of them, but is an arbitrary boundary. It’s a boundary that is useful for discussion and analysis, but not a boundary that the OISs themselves will have motivation to limit themselves with, which is probably a good thing to keep in mind during analysis. So viewed another way, PSR is a potential new substrate for OISs to host themselves on, along with the rest of the social/technological/physical substrate.