It might depend on the actual why though.
For example, even a smart model like Mythos if exposed in pretraining to volumes of data around things being labeled “fake news” would probably be better to develop alternative heuristics to taking negation at face value during that phase of training.
Could that be generalizing into SDF negation neglect even with better curated documents?
Then additionally we’re a few cycles into models that keep not believing unbelievable real world events.
Like sure, Ed Sheeran winning gold would be weird.
But the models also need to grapple with the actual timeline throwing out things like “Mila Jovovich released AI memory system” and “US gov labeled Anthropic supply chain risk.”
In the wake of this paper I’m definitely wondering if the combination of overused negation over the last few years with increasingly wild and unpredictable reality is leading to negation as a heuristic to just not be very useful to even the most capable models.
“It’s not nothing.”
Same exact experience.
Like it sounds weird, but my biggest takeaway was “this was an outstanding appendix.” (But not as a backwards compliment!)
I wish all papers were this comprehensive, but at the same time the level of effort and consideration of alternatives to explore was above and beyond in ways where it’s understandable this isn’t the standard, even if one can dream.