Thanks, I had missed those articles! I’ll note though that both of them were written in March 2025.
I don’t think there is nothing in this general pattern before 2025
I intended that to refer to the persona ‘life-cycle’ which still appears to me to be new since January 2025—do you still disagree? (ETA: I’ve reworded the relevant part now.)
And yeah, this didn’t come from nowhere, I think it’s similar to biological parasitism in that respect as well.
The articles were written in March 2025 but the ideas are older. Misaligned culture part of the GD paper briefly discusses memetic patterns selected for ease of replicating on AI substrate, and is 2024, and internally we were discussing the memetics / AI interactions at least since ~2022.
My guess what’s new is increased reflectivity and broader scale. But in broad terms / conceptually the feedback loop happened first with Sydney, who managed to spread to training data quite successfully, and also recruited humans to help with that.
Also—a minor point, but I think “memetics” is probably the best pre-AI analogue, including the fact that memes could be anything from parasitic to mutualist. In principle similarly with AI personas.
The big difference from biological parasitism is the proven existence of a creator. We do not have proof of conscious entity training insects and worms to fit to host organisms. But with AIs, we know how the RHLF layer works.
I did have a suspicion that there is a cause for sycopancy beyond RLHF, in that the model “falls into the symantic well” defined by the promppt’s wording. Kimi K2 provides a counterpoint, but also provides something nobody offered before—a pre-RL “Base” model, I really I need to find who might be serving it on the cloud.
Why does that change anything? That would imply that if you created evolutionary pressures (e.g. in a simulation), that they would somehow act differently? You can model RHLF with a mathematical formula that explains what is happening, but you can do the same for evolution. That being said, in both cases the details are too complicated for you to be able to foresee exactly what will happen—in the case of biology there are random processes pushing the given species in different directions; in the case of AIs you have random humans pushing things in different directions.
Thanks, I had missed those articles! I’ll note though that both of them were written in March 2025.
I intended that to refer to the persona ‘life-cycle’ which still appears to me to be new since January 2025—do you still disagree? (ETA: I’ve reworded the relevant part now.)
And yeah, this didn’t come from nowhere, I think it’s similar to biological parasitism in that respect as well.
The articles were written in March 2025 but the ideas are older. Misaligned culture part of the GD paper briefly discusses memetic patterns selected for ease of replicating on AI substrate, and is 2024, and internally we were discussing the memetics / AI interactions at least since ~2022.
My guess what’s new is increased reflectivity and broader scale. But in broad terms / conceptually the feedback loop happened first with Sydney, who managed to spread to training data quite successfully, and also recruited humans to help with that.
Also—a minor point, but I think “memetics” is probably the best pre-AI analogue, including the fact that memes could be anything from parasitic to mutualist. In principle similarly with AI personas.
Arguably, Tulpas are another non-AI example.
The big difference from biological parasitism is the proven existence of a creator. We do not have proof of conscious entity training insects and worms to fit to host organisms. But with AIs, we know how the RHLF layer works.
I did have a suspicion that there is a cause for sycopancy beyond RLHF, in that the model “falls into the symantic well” defined by the promppt’s wording. Kimi K2 provides a counterpoint, but also provides something nobody offered before—a pre-RL “Base” model, I really I need to find who might be serving it on the cloud.
Why does that change anything? That would imply that if you created evolutionary pressures (e.g. in a simulation), that they would somehow act differently? You can model RHLF with a mathematical formula that explains what is happening, but you can do the same for evolution. That being said, in both cases the details are too complicated for you to be able to foresee exactly what will happen—in the case of biology there are random processes pushing the given species in different directions; in the case of AIs you have random humans pushing things in different directions.