ACCount comments on I am worried about near-term non-LLM AI developments

ACCount 1 Aug 2025 1:04 UTC
28 points
17
There is an awful lot of “promising new architectures” being thrown around. Few have demonstrated any notable results whatsoever. Fewer still have demonstrated their ability to compete with transformer LLMs on the kind of task transformer LLMs are well suited for.
It’s basically just Mamba SSM and diffusion models, and they aren’t “better LLMs”. They seem like sidegrades to transformer LLMs at best.
HRMs, for example, seem to do incredibly, suspiciously well on certain kinds of puzzles, but I’m yet to see them do anything in language domain, or in math, coding, etc. Are HRMs generalists, like transformers? No evidence of that yet.
Concretely, these are the developments I am predicting within the next six months (i.e. before Feb 1st 2026) with ~75% probability:
Basically, off the top of my head: I’d put 10% on that. Too short of a timeframe.
- p.b. 1 Aug 2025 6:02 UTC
  15 points
  3
  Parent
  SSMs are really quite similar to transformers. Similar to all the “sub-quadratic” transformer variants the expectation is at best that they will do the same thing but more efficiently than transformers.
  HRMs or continuous thought machines or KANs on the other hand contain new and different ideas that make a discontinuous jump in abilities at least conceivable. So I think one should distinguish between those two types of “promising new architectures”.
  My view is that these new ideas accumulate and at some points somebody will be able to put them together in a new way to build actual AGI.
  But the authors of these papers are not stupid. If there was straightforward applicability to language modelling they would already have done that. If there was line of sight for GPT4 level abilities in six month they probably wouldn’t publish the paper.
  - tailcalled 1 Aug 2025 16:03 UTC
    3 points
    0
    Parent
    KANs seem obviously of limited utility to me...?
    - p.b. 1 Aug 2025 19:27 UTC
      2 points
      0
      Parent
      I think it is a cool idea and has its application but you are right that it seems very unlikely to contribute to AGI in any way. But there was nonetheless excitement about integrating KANs into transformers which was easy to do but just didn’t improve anything.
- ryan_b 4 Aug 2025 23:25 UTC
  2 points
  0
  Parent
  Ah, but is it a point-in-time sidegrade with a faster capability curve in the future? At the scale we are working now, even a marginal efficiency improvement threatens to considerably accelerate at least the conventional concerns (power concentration, job loss, etc).
- Fejfo 12 Aug 2025 20:07 UTC
  1 point
  0
  Parent
  It’s my impression that a lot of the “promising new architectures” are indeed promising. IMO a lot of them could compete with transformers if you invest in them. It just isn’t worth the risk while the transformer gold-mine is still open. Why do you disagree?
  - ACCount 13 Aug 2025 0:41 UTC
    1 point
    0
    Parent
    I disagree because I’m yet to see any of those “promising new architectures” outperform even something like GPT-2 345M, weight for weight, at similar tasks. Or show similar performance with a radical reduction in dataset size. Or anything of the sort.
    I don’t doubt that a better architecture than LLM is possible. But if we’re talking AGI, then we need an actual general architecture. Not a benchmark-specific AI that destroys a specific benchmark, but a more general purpose AI that happens to do reasonably well at a variety of benchmarks it wasn’t purposefully trained for.
    We aren’t exactly swimming in that kind of thing.