Lev McKinney comments on Negation Neglect: When models fail to learn negations in training

Lev McKinney 20 May 2026 0:49 UTC
1 point
0
I don’t think its that similar. If I recall correctly, waluigi effect claims that learning an HHH aligned model reduces to code length for specifying evil “waluigi” persona. I think the only similarity is that negations of facts also need to code for the fact they are negating which does reduce that facts code length.