quetzal_rainbow comments on quetzal_rainbow’s Shortform

quetzal_rainbow 22 May 2025 19:57 UTC
9 points
2
I’m so far not impressed with Claude 4s. They are trying to make up superficially plausible stuff for my math questions as fast as possible. Sonnet 3.7, at least, explored a lot of genuinely interesting venues before making an error. “Making up superficially plausible stuff” sounds like a good strategy for hacking not very robust verifiers.
- 1a3orn 22 May 2025 22:14 UTC
  3 points
  0
  Parent
  These seem to be even more optimized for the agentic coder role, and in the absence of strong domain transfer (whether or not that’s a real thing) that means you should mostly expect them to be at about the same level in other domains, or even worse because of the forgetfulness from continued training. Maybe.
- amitlevy49 22 May 2025 21:14 UTC
  1 point
  0
  Parent
  same experience for a physics question on my end
- fencebuilder 22 May 2025 20:14 UTC
  1 point
  0
  Parent
  Did you try both opus and sonnet 4?
  - quetzal_rainbow 22 May 2025 20:16 UTC
    2 points
    0
    Parent
    Yeah, they both made up some stuff in response to the same question.