shawnghu comments on Will Any Crap Cause Emergent Misalignment?

shawnghu 30 Aug 2025 21:55 UTC
4 points
3
I wonder if you’re referring to the “spurious rewards” paper. If so, I wonder if you’re aware of [this critique] (https://safe-lip-9a8.notion.site/Incorrect-Baseline-Evaluations-Call-into-Question-Recent-LLM-RL-Claims-2012f1fbf0ee8094ab8ded1953c15a37) of its methodology, which might be enough to void the result.
- Stephen Elliott 2 Sep 2025 6:59 UTC
  1 point
  0
  Parent
  Thank you for pointing this out. It’s easy to miss these errors like this. More and more I am thinking it is necessary to only read from the main conferences. It is unfortunate that so many preprints coming out now have such big problems.
  - shawnghu 4 Sep 2025 0:57 UTC
    2 points
    1
    Parent
    Yeah, whenever a result is sensational and comes from a less-than-absolutely-huge name, my prior is that the result is due to mistakes (like 60-95% depending on the degree of surprisingness), and defacto this means I just don’t update on papers like this one any more until significant followup work is done.