evhub comments on Claude 4

evhub 22 May 2025 18:11 UTC
9 points
0
See also some notes on reward hacking on twitter and in the model card.
- Sheikh Abdur Raheem Ali 22 May 2025 18:54 UTC
  6 points
  0
  Parent
  Thanks for sharing. I read the entire model card instead of starting with the reward hacking section. The parts I personally found most interesting were sections 4.1.1.5 and 7.3.4. Why isn’t the underperformance of Claude Opus 4 on the AI research evaluation suite considered as evidence of potential sandbagging?
  - Zach Stein-Perlman 22 May 2025 19:17 UTC
    4 points
    0
    Parent
    Links: 4.1.1.5 and 7.3.4.