In this post, I’m talking about deceptive alignment. The threat model you’re talking about here doesn’t really count as deceptive alignment, because the organisms weren’t explicitly using a world model to optimize their choices to cause the bad outcome. AIs like that might still be a problem (e.g. I think that deceptively aligned AI probably contributes less than half of my P(doom from AI)), but I think we should think of them somewhat separately from deceptively aligned models, because they pose risk by somewhat different mechanisms.
In this post, I’m talking about deceptive alignment. The threat model you’re talking about here doesn’t really count as deceptive alignment, because the organisms weren’t explicitly using a world model to optimize their choices to cause the bad outcome. AIs like that might still be a problem (e.g. I think that deceptively aligned AI probably contributes less than half of my P(doom from AI)), but I think we should think of them somewhat separately from deceptively aligned models, because they pose risk by somewhat different mechanisms.