Doing the same with a different architecture could open up the possibility of doing better down the road. I’d be equally interested in how fast it gets better as in how good it is. It would also beg the question: if two architectures can do this, how many more? Do they all max out at the same point or not at all? I think it could be quite important. Would be curious how big experts think the probability is that a different architecture could do LLM level thinking on a reasonably wide range of tasks in say five or ten years.
Doing the same with a different architecture could open up the possibility of doing better down the road. I’d be equally interested in how fast it gets better as in how good it is. It would also beg the question: if two architectures can do this, how many more? Do they all max out at the same point or not at all? I think it could be quite important. Would be curious how big experts think the probability is that a different architecture could do LLM level thinking on a reasonably wide range of tasks in say five or ten years.
I’ve ended up making another post somewhat to this effect, trying to predict any significant architectural shifts over the next year and a half: https://manifold.markets/Jasonb/significant-advancement-in-frontier