the-hightech-creative comments on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

the-hightech-creative 27 Feb 2025 23:46 UTC
2 points
0
It’s interesting though that the results seem somewhat deterministic. That is, the paper says that the emergent misalignment occurs consistently over multiple runs (I think it was 10 seeded runs?)
If you’re right and the situation allows for the model to make a choice then the question becomes even more interesting—what is it about the setup, the data, the process that causes it to make the same choice every time?