technicalities comments on AI in 2025: gestalt

technicalities 8 Dec 2025 20:33 UTC
5 points
0
My perhaps overcynical take is to assume that any benchmark which gets talked about a lot is being optimised. (The ridiculously elaborate scaffold already exists for Pokemon, so why wouldn’t you train on it?) But I would update on an explicit denial.
I was guessing that the transfer learning people would already have some handy coefficient (normalised improvement on nonverifiable tasks / normalised improvement on verifiable tasks) but a quick look doesn’t turn it up.
- Daniel Kokotajlo 8 Dec 2025 20:49 UTC
  5 points
  0
  Parent
  It still says on the Twitch stream “Claude has never been trained to play any Pokemon games”
  
  https://www.twitch.tv/claudeplayspokemon
  - technicalities 8 Dec 2025 21:00 UTC
    4 points
    0
    Parent
    Works for me!
    - Daniel Kokotajlo 8 Dec 2025 21:34 UTC
      3 points
      0
      Parent
      Possibly relevant possibly hallucinated data: https://www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform?commentId=sBtoCfWNnNxxGEgiL