Tao Lin comments on steve2152′s Shortform

Tao Lin 20 May 2025 16:48 UTC
1 point
0
I can somewhat see where you’re coming from about a new method being orders of magnitude more data efficient in RL, but I very strongly bet on transformers being core even after such a paradigm shift. I’m curious whether you think the transformer architecture and text input/output need to go, or whether the new training procedure / architecture fits in with transformers because transformers are just the best information mixing architecture.
- Noosphere89 20 May 2025 18:40 UTC
  3 points
  0
  Parent
  My guess the main issue of current transformers turns out to be the fact that they don’t have a long-term state/memory, and I think this is a pretty critical part of how humans are able to learn on the job as effectively as they do.
  
  The trouble as I’ve heard it is the other approaches which incorporate a state/memory for the long-run are apparently much harder to train reasonably well than transformers, plus first-mover effects.