SSMs are really quite similar to transformers. Similar to all the “sub-quadratic” transformer variants the expectation is at best that they will do the same thing but more efficiently than transformers.
HRMs or continuous thought machines or KANs on the other hand contain new and different ideas that make a discontinuous jump in abilities at least conceivable. So I think one should distinguish between those two types of “promising new architectures”.
My view is that these new ideas accumulate and at some points somebody will be able to put them together in a new way to build actual AGI.
But the authors of these papers are not stupid. If there was straightforward applicability to language modelling they would already have done that. If there was line of sight for GPT4 level abilities in six month they probably wouldn’t publish the paper.
I think it is a cool idea and has its application but you are right that it seems very unlikely to contribute to AGI in any way. But there was nonetheless excitement about integrating KANs into transformers which was easy to do but just didn’t improve anything.
SSMs are really quite similar to transformers. Similar to all the “sub-quadratic” transformer variants the expectation is at best that they will do the same thing but more efficiently than transformers.
HRMs or continuous thought machines or KANs on the other hand contain new and different ideas that make a discontinuous jump in abilities at least conceivable. So I think one should distinguish between those two types of “promising new architectures”.
My view is that these new ideas accumulate and at some points somebody will be able to put them together in a new way to build actual AGI.
But the authors of these papers are not stupid. If there was straightforward applicability to language modelling they would already have done that. If there was line of sight for GPT4 level abilities in six month they probably wouldn’t publish the paper.
KANs seem obviously of limited utility to me...?
I think it is a cool idea and has its application but you are right that it seems very unlikely to contribute to AGI in any way. But there was nonetheless excitement about integrating KANs into transformers which was easy to do but just didn’t improve anything.