Eli Tyre comments on ryan_greenblatt’s Shortform

Eli Tyre 12 Sep 2025 3:44 UTC
LW: 12 AF: 5
4
AF
Interestingly, reasoning doesn’t seem to help Anthropic models on agentic software engineering tasks, but does help OpenAI models.
Is there a standard citation for this?

How do you come by this fact?
- the gears to ascension 14 Sep 2025 1:58 UTC
  2 points
  0
  Parent
  I had a notification ping in my brain just now while using claude code and realizing I’d just told it to think for a long time: I don’t think the claim is true, because it doesn’t match my experience.
- ryan_greenblatt 12 Sep 2025 5:10 UTC
  LW: 2 AF: 2
  0
  AF Parent
  - Anthropic reports SWE bench scores without reasoning which is some evidence it doesn’t help (much) on this sort of task. (See e.g. the release blog post for 4 opus)
  - Anecdotal evidence
  Probably it would be more accurate to say “doesn’t seem to help much while it helps a lot for openai models”.