technicalities comments on AI in 2025: gestalt

technicalities 8 Dec 2025 17:57 UTC
4 points
0
Thanks. I am uncertain (“unclear”), and am interested in sharpening this to the point where it’s testable.
I basically never use a non-RLed model for anything, so I agree with the minimal version of the generalisation claim.
We could just reuse some transfer learning metric? If 100% is full proportional improvement, I’d claim like <10% spillover on nonverified tasks. What about you?
Another thing I was trying to point at is my not knowing what RL environments they’re using for these things, and so not knowing what tasks count in the denominator. I’m not going to know either.
- Daniel Kokotajlo 8 Dec 2025 20:06 UTC
  6 points
  2
  Parent
  Seems like Claude has been getting better at playing Pokemon, despite not having been trained on any sort of Pokemon game at all. (Epistemic status: Not sure actually, we don’t know what Anthropic does internally, maybe they’ve trained it on video games for all we know. But I don’t think they have.)
  
  Isn’t this therefore an example of transfer/generalization?
  
  What transfer learning metrics do you have in mind?
  What links here?
  - AI in 2025: gestalt by technicalities (7 Dec 2025 21:25 UTC; 248 points)
  - technicalities 8 Dec 2025 20:33 UTC
    5 points
    0
    Parent
    My perhaps overcynical take is to assume that any benchmark which gets talked about a lot is being optimised. (The ridiculously elaborate scaffold already exists for Pokemon, so why wouldn’t you train on it?) But I would update on an explicit denial.
    I was guessing that the transfer learning people would already have some handy coefficient (normalised improvement on nonverifiable tasks / normalised improvement on verifiable tasks) but a quick look doesn’t turn it up.
    - Daniel Kokotajlo 8 Dec 2025 20:49 UTC
      5 points
      0
      Parent
      It still says on the Twitch stream “Claude has never been trained to play any Pokemon games”
      
      https://www.twitch.tv/claudeplayspokemon
      - technicalities 8 Dec 2025 21:00 UTC
        4 points
        0
        Parent
        Works for me!
        Daniel Kokotajlo 8 Dec 2025 21:34 UTC
        3 points
        0
        Parent
        Possibly relevant possibly hallucinated data: https://www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform?commentId=sBtoCfWNnNxxGEgiL
- Jacob Pfau 8 Dec 2025 21:55 UTC
  2 points
  0
  Parent
  I suppose there’s two questions here:
  1. How strong is generalization in general in RL?
  2. Is there a ‘generalization barrier’ between easy-to-verify and hard-to-verify tasks
  I’m guessing you mainly are thinking of (1) and have (2) as a special case?
  
  To respond to your question, I’m reading it as:
  
  We assume that there’s a constant multiplier in samples-to-performance needed to match in-domain training with out-of-domain training. For ‘nearby’ verifiable and non-verifiable tasks is that constant >= 10x?
  
  I would guess modally somewhere 3-10x. I’m imagining here comparing training on more more olympiad problems vs some looser question like ‘Compare the clarity of these two proofs’. Of course there’s diminishing returns etc. so it’s not really a constant factor when taking a narrow domain.
  
  I do agree that there are areas where domain-specific training is a bottleneck, and plausibly some of those are non-verifiable ones. See also my shortform where I discuss some reasons for such a need https://www.lesswrong.com/posts/FQAr3afEZ9ehhssmN/jacob-pfau-s-shortform?commentId=vdBjv3frxvFincwvz