anaguma comments on Energy-Based Transformers are Scalable Learners and Thinkers

anaguma 9 Jul 2025 1:16 UTC
1 point
0
Unfortunately they extended the scaling curves to ~10 B tokens, less than 3OOMs of the data used to train frontier models. So it’s unclear whether this will work at scale, and the fact that they didn’t extend it further is some evidence against it working.
- Martin Vlach 9 Jul 2025 9:18 UTC
  4 points
  0
  Parent
  you seem to report one OOM less than this picture in https://alexiglad.github.io/blog/2025/ebt/#:~:text=a%20log%20function).-,Figure%208,-%3A%20Scaling%20for
  - anaguma 9 Jul 2025 21:16 UTC
    2 points
    0
    Parent
    Interesting, I was looking at figure 7, but that seems to be a much smaller run. I retract my original comment.