Costs don’t really grow linearly with model size because utilization goes down as you spread a model across many GPUs. I. e. aggregate memory requirements grow superlinearly. Relatedly, model sizes increased <100x while compute increased 300000x on OpenAI’s data set. That’s been updating my views a bit recently.
People are trying to solve this with things like GPipe, but I don’t know yet if there can be an approach that scales to many more TPUs than what they tried (8). Communication would be the next bottleneck.
Yes, this is a good point. Nonetheless, in terms of whether social learning is important, the relevant question is the relative efficiency of the “social learning channel” and the “scaled up GPUs channel”. The social learning channel requires you to work with inputs and outputs only (language and behavior for humans), while the scaled up GPUs allows you to work directly with internal representations, so I still expect that social learning won’t be particularly important for AI systems unless they use it to learn from humans.
Relatedly, model sizes increased <100x while compute increased 300000x on OpenAI’s data set.
This doesn’t seem relevant to the question of whether social learning is important? Perhaps you were just stating an interesting related fact, but if you were trying to make a point I don’t know what it is.
Costs don’t really grow linearly with model size because utilization goes down as you spread a model across many GPUs. I. e. aggregate memory requirements grow superlinearly. Relatedly, model sizes increased <100x while compute increased 300000x on OpenAI’s data set. That’s been updating my views a bit recently.
People are trying to solve this with things like GPipe, but I don’t know yet if there can be an approach that scales to many more TPUs than what they tried (8). Communication would be the next bottleneck.
https://ai.googleblog.com/2019/03/introducing-gpipe-open-source-library.html?m=1
Yes, this is a good point. Nonetheless, in terms of whether social learning is important, the relevant question is the relative efficiency of the “social learning channel” and the “scaled up GPUs channel”. The social learning channel requires you to work with inputs and outputs only (language and behavior for humans), while the scaled up GPUs allows you to work directly with internal representations, so I still expect that social learning won’t be particularly important for AI systems unless they use it to learn from humans.
This doesn’t seem relevant to the question of whether social learning is important? Perhaps you were just stating an interesting related fact, but if you were trying to make a point I don’t know what it is.
Yep my comment was about the linear scale up rather than it’s implications for social learning.