The type of research I did was very reasoning-heavy. It’s architecture research in which you think hard about how to mathematically guarantee that your network obeys some symmetry constraints appropriate for a domain and data source.
As a researcher in that area, you have a very strong incentive to claim that a special sauce is necessary for intelligence, since providing special sauces is all you do. As such, my prior is to believe that these researchers don’t have any interesting objection to continued scaling and “normal” algorithmic improvements to lead to AGI and then superintelligence.
It might still be interesting to engage when the opportunity arises, but I wouldn’t put extra effort into making such a discussion happen.
I definetely see your point in how the incentives here are skewed. I would want to ask you what you think of the claims about inductive biases and difficulty of causal graph learning for transformers? A guess is that you could just add it on top of the base architecture as a MOA model with RL in it to solve some problems here but that feels like people from the larger labs might not realise that at first?
Also, I wasn’t only talking about GDL, there’s like two or three other disciplines that also have some ways they believe that AGI will need other sorts of modelling capacity.
Some of the organisation taking explicit bets from other directions are:
Symbolica is more on the same train as GDL but from a category theory perspective, the TL;DR of their take is that it takes other types of reasoning capacity in order to combine various data types into one model and that transformers aren’t expressive nor flexible enough to support this.
For Verses, I think you should think ACS & Jan Kulveit Active Inference models & lack of planning with self in mind due to lacking information about what the self-other boundary is for auto-encoders compared to something that has an action-perception loop.
I might write something up on this if you think it might be useful.
Thanks for these further pointers! I won’t go into detail, I will just say that I take the bitter lesson very seriously and that I think most of the ideas you mention won’t be needed for superintelligence. Some intuitions why I take typical arguments for limits of transformers not very seriously:
If you hook up a transformer to itself with a reasoning scratchpad, then I think it can in principle represent any computation, beyond what would be possible in a single forward pass.
On causality: Once we change to the agent-paradigm, transformers naturally get causal data since they will see how the “world responds” to their actions.
General background intuition: Humans developed general intelligence and a causal understanding of the world by evolution, without anyone designing us very deliberately.
I worked on geometric/equivariant deep learning a few years ago (with some success, leading to two ICLR papers and a patent, see my google scholar: https://scholar.google.com/citations?user=E3ae_sMAAAAJ&hl=en).
The type of research I did was very reasoning-heavy. It’s architecture research in which you think hard about how to mathematically guarantee that your network obeys some symmetry constraints appropriate for a domain and data source.
As a researcher in that area, you have a very strong incentive to claim that a special sauce is necessary for intelligence, since providing special sauces is all you do. As such, my prior is to believe that these researchers don’t have any interesting objection to continued scaling and “normal” algorithmic improvements to lead to AGI and then superintelligence.
It might still be interesting to engage when the opportunity arises, but I wouldn’t put extra effort into making such a discussion happen.
Interesting!
I definetely see your point in how the incentives here are skewed. I would want to ask you what you think of the claims about inductive biases and difficulty of causal graph learning for transformers? A guess is that you could just add it on top of the base architecture as a MOA model with RL in it to solve some problems here but that feels like people from the larger labs might not realise that at first?
Also, I wasn’t only talking about GDL, there’s like two or three other disciplines that also have some ways they believe that AGI will need other sorts of modelling capacity.
Some of the organisation taking explicit bets from other directions are:
https://www.symbolica.ai/
https://www.verses.ai/genius
Symbolica is more on the same train as GDL but from a category theory perspective, the TL;DR of their take is that it takes other types of reasoning capacity in order to combine various data types into one model and that transformers aren’t expressive nor flexible enough to support this.
For Verses, I think you should think ACS & Jan Kulveit Active Inference models & lack of planning with self in mind due to lacking information about what the self-other boundary is for auto-encoders compared to something that has an action-perception loop.
I might write something up on this if you think it might be useful.
Thanks for these further pointers! I won’t go into detail, I will just say that I take the bitter lesson very seriously and that I think most of the ideas you mention won’t be needed for superintelligence. Some intuitions why I take typical arguments for limits of transformers not very seriously:
If you hook up a transformer to itself with a reasoning scratchpad, then I think it can in principle represent any computation, beyond what would be possible in a single forward pass.
On causality: Once we change to the agent-paradigm, transformers naturally get causal data since they will see how the “world responds” to their actions.
General background intuition: Humans developed general intelligence and a causal understanding of the world by evolution, without anyone designing us very deliberately.