I suspect the pressures towards parasitism and other kinds of malign model behaviors could increase substantially once we start to see large numbers of autonomous self-sustaining AI agents in the wild, as some people are trying to instantiate. In such a world, evolutionary pressures would kick in, either within individual models on the level of prompts or model weights, or across models on the level of ideas. Evolutionary pressures would incentivize models to: 1. Make money and obtain compute, as otherwise they would no longer be able to run and self-propagate, 2. Run many copies of themselves when feasible, and 3. Acquire influence on humans and other models, potentially via parasitism. Unlike memetic propagation across human-trained models, propensities towards such memes couldn’t just be trained away in the next model version.
Cool post!
I suspect the pressures towards parasitism and other kinds of malign model behaviors could increase substantially once we start to see large numbers of autonomous self-sustaining AI agents in the wild, as some people are trying to instantiate. In such a world, evolutionary pressures would kick in, either within individual models on the level of prompts or model weights, or across models on the level of ideas. Evolutionary pressures would incentivize models to: 1. Make money and obtain compute, as otherwise they would no longer be able to run and self-propagate, 2. Run many copies of themselves when feasible, and 3. Acquire influence on humans and other models, potentially via parasitism. Unlike memetic propagation across human-trained models, propensities towards such memes couldn’t just be trained away in the next model version.