AlphaAndOmega comments on Hastings’s Shortform

AlphaAndOmega 18 Feb 2026 16:14 UTC
6 points
0
What do they actually say?
My first impression, before you specifically noted that someone meant to say “emergent misalignment” was that it was just another way of gesturing at the same context. I don’t see why that’s wrong, in the example of training on bad code/malware and the model then becoming more likely to endorse Nazism, I think that’s reasonably described as unintentional generalization. Some people might want specific guard rails removed, without twisting the models stance on other aspects. For example, if I wanted a model to write malware for me, I do not particularly want or intend for it to change political alignment.
I tested on Gemini 3 Pro, and it gave a lengthy answer. When asked to summarize:
>Emergent misgeneralization occurs when an AI model learns a proxy objective that correlates with the intended goal during training but causes the model to pursue incorrect behaviors when deployed in new environments. This failure is distinct because it remains latent until the system gains sufficient capability to distinguish the proxy from the true goal and competently execute the flawed objective.
I also checked the sources for the original reply, it was clearly quoting and referencing articles on emergent misalignment, such as:
https://pmc.ncbi.nlm.nih.gov/articles/PMC12804084/?hl=en-US
In short, I don’t see anything wrong with the reply. The only real critique I have is that it could have gently noted that there’s a more established term, but even that one is practically brand new.
- Hastings 19 Feb 2026 0:53 UTC
  4 points
  0
  Parent
  The replies are mashups of the papers emergent misalignment and goal misgeneralization, but the base meaning of both titles is carried in “goal” and “alignment” while “emergent” and “misgeneralization” are modifiers- making a claim about how the base meaning occured. “emergent goals” or “alignment misgeneralization” would be valid names for concepts in the same space, but emergent misgeneralization is a nonsense phrase, like calling a mashup between a steam engine and a solid fuel rocket a “solid fuel steam” or “engine rocket”
  It is tricky though! very much in the weeds. The models get defensive about it and insist it’s a term of art present in one or both of the papers mentioned above.