But most science requires actually looking at the world. The reason we spend so much money on scientific equipment is because we need to check if our ideas correspond to reality, and we can’t do that just by reading text.
I agree. The primary thing I’m aiming to predict using this model is when LLMs will be capable of performing human-level reasoning/thinking reliably over long sequences. It could still be true that, even if we had models that did that, they wouldn’t immediately have a large scientific/economic impact on the world, since science requires a lot more than thinking well. (There are also a number of other issues like models hallucinating, even if they’re capable of reasoning coherently, but I’ll set aside that for now.)
The original definition of transformative AI referred broadly to economic, social and scientific progress society-wide, rather than just AI that reasoned and understood the world broadly as well as a human does. An implicit assumption in this blog post is that TAI will follow after the development of high quality AI thinkers. That’s not an assumption I defended because I felt it was separate from the purpose of the report. To be clear, I also find this assumption questionable, so I suppose it’s worth clarifying that in the post.
I consider it important to explore what we should expect after the development of high quality reasoners, which I expect we can do by incorporating this framework into the context of a more general takeoff model, such as Tom Davidson’s recent model. I have yet to bridge these two models, but I suspect after bridging them, we might get more insight into this question.
The primary thing I’m aiming to predict using this model is when LLMs will be capable of performing human-level reasoning/thinking reliably over long sequences.
Yeah, and I agree this model seems to be aiming at that. What I was trying to get at in the later part of my comment is that I’m not sure you can get human-level reasoning on text as it exists now (perhaps because it fails to capture certain patterns), that it might require more engagement with the real world (because maybe that’s how you capture those patterns), and that training on whichever distribution does give human-level reasoning might have substantially different scaling regularities. But I don’t think I made this very clear and it should be read as “Rick’s wild speculation”, not “Rick’s critique of the model’s assumptions”.
training on whichever distribution does give human-level reasoning might have substantially different scaling regularities.
I agree again. I talked a little bit about this at the end of my post, but overall I just don’t have any data for scaling laws on better distributions than the one in the Chinchilla paper. I’d love to know the scaling properties of training on scientific tasks and incorporate that into the model, but I just don’t have anything like that right now.
Also, this post is more about the method rather than any conclusions I may have drawn. I hope this model can be updated with better data some day.
I agree. The primary thing I’m aiming to predict using this model is when LLMs will be capable of performing human-level reasoning/thinking reliably over long sequences. It could still be true that, even if we had models that did that, they wouldn’t immediately have a large scientific/economic impact on the world, since science requires a lot more than thinking well. (There are also a number of other issues like models hallucinating, even if they’re capable of reasoning coherently, but I’ll set aside that for now.)
The original definition of transformative AI referred broadly to economic, social and scientific progress society-wide, rather than just AI that reasoned and understood the world broadly as well as a human does. An implicit assumption in this blog post is that TAI will follow after the development of high quality AI thinkers. That’s not an assumption I defended because I felt it was separate from the purpose of the report. To be clear, I also find this assumption questionable, so I suppose it’s worth clarifying that in the post.
I consider it important to explore what we should expect after the development of high quality reasoners, which I expect we can do by incorporating this framework into the context of a more general takeoff model, such as Tom Davidson’s recent model. I have yet to bridge these two models, but I suspect after bridging them, we might get more insight into this question.
Yeah, and I agree this model seems to be aiming at that. What I was trying to get at in the later part of my comment is that I’m not sure you can get human-level reasoning on text as it exists now (perhaps because it fails to capture certain patterns), that it might require more engagement with the real world (because maybe that’s how you capture those patterns), and that training on whichever distribution does give human-level reasoning might have substantially different scaling regularities. But I don’t think I made this very clear and it should be read as “Rick’s wild speculation”, not “Rick’s critique of the model’s assumptions”.
I agree again. I talked a little bit about this at the end of my post, but overall I just don’t have any data for scaling laws on better distributions than the one in the Chinchilla paper. I’d love to know the scaling properties of training on scientific tasks and incorporate that into the model, but I just don’t have anything like that right now.
Also, this post is more about the method rather than any conclusions I may have drawn. I hope this model can be updated with better data some day.