ryan_greenblatt comments on ryan_greenblatt’s Shortform

ryan_greenblatt 12 Sep 2025 5:10 UTC
LW: 2 AF: 2
0
AF
- Anthropic reports SWE bench scores without reasoning which is some evidence it doesn’t help (much) on this sort of task. (See e.g. the release blog post for 4 opus)
- Anecdotal evidence
Probably it would be more accurate to say “doesn’t seem to help much while it helps a lot for openai models”.