Assume that as a consequence of being in the Paul-verse, regulatory and other practical obstacles are possible to overcome in a very cost-effective way. In this world, how much value does current language models create?
I would answer that in this obstacle-free world, they create about 10% of global GDP and this share would be rapidly increasing. This is because a large set of valuable tasks are both simple enough that models could understand them, and possible to transform into a prompt completion task.
I don’t agree with that at all. I think in this counterfactual world current language models would create about as much value as they create now, maybe higher by some factor but most likely not by an order of magnitude or more.
The argument is meant as a reductio: Language models don’t create value in our world, so the obstacles must be hard to overcome, so we are not in the Paul-verse.
I know this is what your argument is. For me the conclusion implied by “language models don’t create value in our world” is “language models are not capable of creating value in our world & we’re not capable of using them to create value”, not that “the practical obstacles are hard to overcome”. Also, this last claim about “practical obstacles” is very vague: if you can’t currently buy a cheap ticket to Mars, is that a problem with “practical obstacles being difficult to overcome” or not?
In some sense there’s likely a billion dollar company idea which would build on existing language models, so if someone thought of the idea and had the right group of people to implement it they could be generating a lot of revenue. This would look very different from language models creating 10% of GDP, however.
I claim that most coordination-tasks (defined very broadly) in our civilization could be done by language models talking to each other, if we could overcome the enormous obstacle of getting all relevant information into the prompts and transferring the completions to “the real world”.
I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards.
Regarding the bet: Even odds sounds like easy money to me, so you’re on :). I weakly expect that my winning criteria will never come to pass, as we will be dead.
I’ll be happy to claim victory when AGI is here and we’re not all dead.
I claim that most coordination-tasks (defined very broadly) in our civilization could be done by language models talking to each other, if we could overcome the enormous obstacle of getting all relevant information into the prompts and transferring the completions to “the real world”.
I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards.
Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP. How long do you think it would take before language models did >50% of all coordination tasks?
Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP. How long do you think it would take before language models did >50% of all coordination tasks?
2 standard deviations above the human average with respect to what metric? My whole point is that the metrics people look at in ML papers are not necessarily relevant in the real world and/or the real world impact (say, in revenue generated by the models) is a discontinuous function of these metrics.
I would guess that 2 standard deviations above human average on commonly used language modeling benchmarks is still far from enough for even 10% of coordination tasks, though by this point models could well be generating plenty of revenue.
I think we are close to agreeing with each other on how we expect the future to look. I certainly agree that real world impact is discontinuous in metrics, though I would blame practical matters rather than poor metrics.
I don’t agree with that at all. I think in this counterfactual world current language models would create about as much value as they create now, maybe higher by some factor but most likely not by an order of magnitude or more.
I know this is what your argument is. For me the conclusion implied by “language models don’t create value in our world” is “language models are not capable of creating value in our world & we’re not capable of using them to create value”, not that “the practical obstacles are hard to overcome”. Also, this last claim about “practical obstacles” is very vague: if you can’t currently buy a cheap ticket to Mars, is that a problem with “practical obstacles being difficult to overcome” or not?
In some sense there’s likely a billion dollar company idea which would build on existing language models, so if someone thought of the idea and had the right group of people to implement it they could be generating a lot of revenue. This would look very different from language models creating 10% of GDP, however.
I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards.
I’ll be happy to claim victory when AGI is here and we’re not all dead.
Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP. How long do you think it would take before language models did >50% of all coordination tasks?
2 standard deviations above the human average with respect to what metric? My whole point is that the metrics people look at in ML papers are not necessarily relevant in the real world and/or the real world impact (say, in revenue generated by the models) is a discontinuous function of these metrics.
I would guess that 2 standard deviations above human average on commonly used language modeling benchmarks is still far from enough for even 10% of coordination tasks, though by this point models could well be generating plenty of revenue.
I think we are close to agreeing with each other on how we expect the future to look. I certainly agree that real world impact is discontinuous in metrics, though I would blame practical matters rather than poor metrics.