My take as someone who thinks along similar lines to Paul is that in the Paul-verse, if these models aren’t being used to generate a lot of customer revenue then they are actually not very useful even if some abstract metric you came up with says they do better than humans on average.
It may even be that your metric is right and the model outperforms humans on a specific task, but AI has been outperforming humans on some tasks for a very long time now. It’s just not easy to find profitable uses for most of those tasks, in the sense that the total consumer surplus generated by being able to perform them cheaply and at a high quality is low.
How so? My point isn’t that you don’t see fast growth in the ability of a particular technology to create revenue, it’s that if that doesn’t happen it’s probably because the technology isn’t profitable and not because it’s blocked by practical or regulatory constraints.
Of course the world is such that even the most primitive technology likely has new ways it could be used to create a lot of revenue and that’s what entrepreneurs do, so there’s always some room for “nobody has thought of the idea” or “the right group of people to make it happen didn’t get together” or some other stumbling block.
My point is that in Paul-verse, AI systems that are capable of generating a doubling of gross world product in short order wouldn’t be impeded seriously by regulatory constraints, and if GWP is not doubling that points to a problem with either the AI system or our ability to conceive of profitable uses for it rather than regulatory constraints slowing growth down.
I struggle to understand your first sentence. Do you cash out “Useful” as “Having the theoretical ability to do a task”? As in: “If an AI benchmarks better than humans at a task, but don’t generate revenue, the reason must be that the AI is not actually capable of doing the task”.
In the Paul-verse, how does AI contribute substantially to GDP at AI capability levels between “Average Human” and “Superintelligence”?
It seems (to me) that the reasons are practical issues, inertia, regulatory, bureaucracy, conservatism etc., and not “Lack of AI Capability”. As an example, assume that Google tomorrow has a better version of the same model, which is 2 standard deviations above the human average on all language benchmarks we can think of. How would that double GDP?
There might not be time for the economy to double in size between “>2 standard deviations improvements on all language tasks” and “Able to substantially recursively self-improve”.
I think the issue here is that the tasks in question don’t fully capture everything we care about in terms of language facility. I think this is largely because even very low probabilities of catastrophic actions can preclude deployment in an economically useful way.
For example, a prime use of a language model would be to replace customer service representative. However, if there is even a one in a million chance that your model will start cursing out a customer, offer a customer a million dollars to remedy an error, or start spewing racial epithets, the model cannot be usefully deployed in such a fashion. None of the metrics in the paper can guarantee, or even suggest, that level of consistency.
Likely higher than one in a million, but they can be fired after a failure to allow the company to save face. Harder to do that with a $50M language model.
But this doesn’t solve the problem of angry customers and media the way firing a misbehaving employee would. Though I suppose this is more an issue of friction/aversion to change than an actual capabilities issue.
I struggle to understand your first sentence. Do you cash out “Useful” as “Having the theoretical ability to do a task”? As in: “If an AI benchmarks better than humans at a task, but don’t generate revenue, the reason must be that the AI is not actually capable of doing the task”.
No, I mean that being able to do the task cheaply and at a high quality is simply not that valuable. AI went from being uncompetitive against professional Go players on top-notch hardware to being able to beat them running on a GPU you can buy for less than $100, but the consumer surplus that’s been created by this is very small.
In the Paul-verse, how does AI contribute substantially to GDP at AI capability levels between “Average Human” and “Superintelligence”?
If AI is already as capable as an average human then you’re really not far off from the singularity, in the sense that gross world product growth will explode within a short time and I don’t know what happens afterwards. My own opinion (may not be shared by Paul) is that you can actually get to the singularity even with AI that’s much worse than humans just because AI is so much easier to produce en masse and to improve at the tasks it can perform.
I’ll have an essay coming out about takeoff speeds on Metaculus in less than ten days (will also be crossposted to LessWrong) so I’ll elaborate more on why I think this way there.
It seems (to me) that the reasons are practical issues, inertia, regulatory, bureaucracy, conservatism etc., and not “Lack of AI Capability”. As an example, assume that Google tomorrow has a better version of the same model, which is 2 standard deviations above the human average on all language benchmarks we can think of. How would that double GDP?
Why do you think being above the human average on all language benchmarks is something that should cash out in the form of a big of consumer surplus? I think we agree that this is not true for playing Go or recognizing pictures of cats or generating impressive-looking original art, so what is the difference when it comes to being better at predicting the next word in a sentence or at solving logic puzzles given in verbal format?
There might not be time for the economy to double in size between “>2 standard deviations improvements on all language tasks” and “Able to substantially recursively self-improve”.
Of course there might not be time, but I’m happy to take you up on a bet (a symbolic one if actual settlement in the event of a singularity is meaningless) at even odds if you think this is more likely than the alternative.
Assume that as a consequence of being in the Paul-verse, regulatory and other practical obstacles are possible to overcome in a very cost-effective way. In this world, how much value does current language models create?
I would answer that in this obstacle-free world, they create about 10% of global GDP and this share would be rapidly increasing. This is because a large set of valuable tasks are both simple enough that models could understand them, and possible to transform into a prompt completion task.
The argument is meant as a reductio: Language models don’t create value in our world, so the obstacles must be hard to overcome, so we are not in the Paul-verse.
I claim that most coordination-tasks (defined very broadly) in our civilization could be done by language models talking to each other, if we could overcome the enormous obstacle of getting all relevant information into the prompts and transferring the completions to “the real world”.
Regarding the bet: Even odds sounds like easy money to me, so you’re on :). I weakly expect that my winning criteria will never come to pass, as we will be dead.
What exactly do you mean by “create 10% of global GDP” ?
And why would you expect the current quite unreliable language models to have such a drastic effect ?
Anyway I will counterbet that by 2032 most translation will be automated (90%) most programmers will use automated tools dayly (70%) most top level mathematics journals will use proof-checking software as part of their reviewing process (80%) and computer generated articles will make up a majority of Internet “journalism” (50%).
I only have a vague idea what is meant by language models contributing to GDP.
Current language models are actually quite reliable when you give them easy questions. Practical deployment of language models are sometimes held to very high standards of reliability and lack of bias, possibly for regulatory, social or other practical reasons. Yet I personally know someone who works in customer service and is somewhat racist and not very reliable.
I am not sure I understand your counterbet. I would guess most translation is already automated, most programmers use automated tools already and most Internet “journalism” is already computer generated.
Assume that as a consequence of being in the Paul-verse, regulatory and other practical obstacles are possible to overcome in a very cost-effective way. In this world, how much value does current language models create?
I would answer that in this obstacle-free world, they create about 10% of global GDP and this share would be rapidly increasing. This is because a large set of valuable tasks are both simple enough that models could understand them, and possible to transform into a prompt completion task.
I don’t agree with that at all. I think in this counterfactual world current language models would create about as much value as they create now, maybe higher by some factor but most likely not by an order of magnitude or more.
The argument is meant as a reductio: Language models don’t create value in our world, so the obstacles must be hard to overcome, so we are not in the Paul-verse.
I know this is what your argument is. For me the conclusion implied by “language models don’t create value in our world” is “language models are not capable of creating value in our world & we’re not capable of using them to create value”, not that “the practical obstacles are hard to overcome”. Also, this last claim about “practical obstacles” is very vague: if you can’t currently buy a cheap ticket to Mars, is that a problem with “practical obstacles being difficult to overcome” or not?
In some sense there’s likely a billion dollar company idea which would build on existing language models, so if someone thought of the idea and had the right group of people to implement it they could be generating a lot of revenue. This would look very different from language models creating 10% of GDP, however.
I claim that most coordination-tasks (defined very broadly) in our civilization could be done by language models talking to each other, if we could overcome the enormous obstacle of getting all relevant information into the prompts and transferring the completions to “the real world”.
I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards.
Regarding the bet: Even odds sounds like easy money to me, so you’re on :). I weakly expect that my winning criteria will never come to pass, as we will be dead.
I’ll be happy to claim victory when AGI is here and we’re not all dead.
I claim that most coordination-tasks (defined very broadly) in our civilization could be done by language models talking to each other, if we could overcome the enormous obstacle of getting all relevant information into the prompts and transferring the completions to “the real world”.
I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards.
Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP. How long do you think it would take before language models did >50% of all coordination tasks?
Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP. How long do you think it would take before language models did >50% of all coordination tasks?
2 standard deviations above the human average with respect to what metric? My whole point is that the metrics people look at in ML papers are not necessarily relevant in the real world and/or the real world impact (say, in revenue generated by the models) is a discontinuous function of these metrics.
I would guess that 2 standard deviations above human average on commonly used language modeling benchmarks is still far from enough for even 10% of coordination tasks, though by this point models could well be generating plenty of revenue.
I think we are close to agreeing with each other on how we expect the future to look. I certainly agree that real world impact is discontinuous in metrics, though I would blame practical matters rather than poor metrics.
My take as someone who thinks along similar lines to Paul is that in the Paul-verse, if these models aren’t being used to generate a lot of customer revenue then they are actually not very useful even if some abstract metric you came up with says they do better than humans on average.
It may even be that your metric is right and the model outperforms humans on a specific task, but AI has been outperforming humans on some tasks for a very long time now. It’s just not easy to find profitable uses for most of those tasks, in the sense that the total consumer surplus generated by being able to perform them cheaply and at a high quality is low.
I get what you mean but also think rapid uptake of smartphones is a counterpoint.
How so? My point isn’t that you don’t see fast growth in the ability of a particular technology to create revenue, it’s that if that doesn’t happen it’s probably because the technology isn’t profitable and not because it’s blocked by practical or regulatory constraints.
Of course the world is such that even the most primitive technology likely has new ways it could be used to create a lot of revenue and that’s what entrepreneurs do, so there’s always some room for “nobody has thought of the idea” or “the right group of people to make it happen didn’t get together” or some other stumbling block.
My point is that in Paul-verse, AI systems that are capable of generating a doubling of gross world product in short order wouldn’t be impeded seriously by regulatory constraints, and if GWP is not doubling that points to a problem with either the AI system or our ability to conceive of profitable uses for it rather than regulatory constraints slowing growth down.
I struggle to understand your first sentence. Do you cash out “Useful” as “Having the theoretical ability to do a task”? As in: “If an AI benchmarks better than humans at a task, but don’t generate revenue, the reason must be that the AI is not actually capable of doing the task”.
In the Paul-verse, how does AI contribute substantially to GDP at AI capability levels between “Average Human” and “Superintelligence”?
It seems (to me) that the reasons are practical issues, inertia, regulatory, bureaucracy, conservatism etc., and not “Lack of AI Capability”. As an example, assume that Google tomorrow has a better version of the same model, which is 2 standard deviations above the human average on all language benchmarks we can think of. How would that double GDP?
There might not be time for the economy to double in size between “>2 standard deviations improvements on all language tasks” and “Able to substantially recursively self-improve”.
I think the issue here is that the tasks in question don’t fully capture everything we care about in terms of language facility. I think this is largely because even very low probabilities of catastrophic actions can preclude deployment in an economically useful way.
For example, a prime use of a language model would be to replace customer service representative. However, if there is even a one in a million chance that your model will start cursing out a customer, offer a customer a million dollars to remedy an error, or start spewing racial epithets, the model cannot be usefully deployed in such a fashion. None of the metrics in the paper can guarantee, or even suggest, that level of consistency.
I wonder what the failure probability is for human customer service employees.
Likely higher than one in a million, but they can be fired after a failure to allow the company to save face. Harder to do that with a $50M language model.
Just delete the context window and tweak the prompt.
But this doesn’t solve the problem of angry customers and media the way firing a misbehaving employee would. Though I suppose this is more an issue of friction/aversion to change than an actual capabilities issue.
No, I mean that being able to do the task cheaply and at a high quality is simply not that valuable. AI went from being uncompetitive against professional Go players on top-notch hardware to being able to beat them running on a GPU you can buy for less than $100, but the consumer surplus that’s been created by this is very small.
If AI is already as capable as an average human then you’re really not far off from the singularity, in the sense that gross world product growth will explode within a short time and I don’t know what happens afterwards. My own opinion (may not be shared by Paul) is that you can actually get to the singularity even with AI that’s much worse than humans just because AI is so much easier to produce en masse and to improve at the tasks it can perform.
I’ll have an essay coming out about takeoff speeds on Metaculus in less than ten days (will also be crossposted to LessWrong) so I’ll elaborate more on why I think this way there.
Why do you think being above the human average on all language benchmarks is something that should cash out in the form of a big of consumer surplus? I think we agree that this is not true for playing Go or recognizing pictures of cats or generating impressive-looking original art, so what is the difference when it comes to being better at predicting the next word in a sentence or at solving logic puzzles given in verbal format?
Of course there might not be time, but I’m happy to take you up on a bet (a symbolic one if actual settlement in the event of a singularity is meaningless) at even odds if you think this is more likely than the alternative.
Assume that as a consequence of being in the Paul-verse, regulatory and other practical obstacles are possible to overcome in a very cost-effective way. In this world, how much value does current language models create?
I would answer that in this obstacle-free world, they create about 10% of global GDP and this share would be rapidly increasing. This is because a large set of valuable tasks are both simple enough that models could understand them, and possible to transform into a prompt completion task.
The argument is meant as a reductio: Language models don’t create value in our world, so the obstacles must be hard to overcome, so we are not in the Paul-verse.
I claim that most coordination-tasks (defined very broadly) in our civilization could be done by language models talking to each other, if we could overcome the enormous obstacle of getting all relevant information into the prompts and transferring the completions to “the real world”.
Regarding the bet: Even odds sounds like easy money to me, so you’re on :). I weakly expect that my winning criteria will never come to pass, as we will be dead.
What exactly do you mean by “create 10% of global GDP” ?
And why would you expect the current quite unreliable language models to have such a drastic effect ?
Anyway I will counterbet that by 2032 most translation will be automated (90%) most programmers will use automated tools dayly (70%) most top level mathematics journals will use proof-checking software as part of their reviewing process (80%) and computer generated articles will make up a majority of Internet “journalism” (50%).
I only have a vague idea what is meant by language models contributing to GDP.
Current language models are actually quite reliable when you give them easy questions. Practical deployment of language models are sometimes held to very high standards of reliability and lack of bias, possibly for regulatory, social or other practical reasons. Yet I personally know someone who works in customer service and is somewhat racist and not very reliable.
I am not sure I understand your counterbet. I would guess most translation is already automated, most programmers use automated tools already and most Internet “journalism” is already computer generated.
I don’t agree with that at all. I think in this counterfactual world current language models would create about as much value as they create now, maybe higher by some factor but most likely not by an order of magnitude or more.
I know this is what your argument is. For me the conclusion implied by “language models don’t create value in our world” is “language models are not capable of creating value in our world & we’re not capable of using them to create value”, not that “the practical obstacles are hard to overcome”. Also, this last claim about “practical obstacles” is very vague: if you can’t currently buy a cheap ticket to Mars, is that a problem with “practical obstacles being difficult to overcome” or not?
In some sense there’s likely a billion dollar company idea which would build on existing language models, so if someone thought of the idea and had the right group of people to implement it they could be generating a lot of revenue. This would look very different from language models creating 10% of GDP, however.
I agree with this in principle, but in practice I think current language models are much too bad for this to be on the cards.
I’ll be happy to claim victory when AGI is here and we’re not all dead.
Assume PaLM magically improved to perform 2 standard deviations above the human average. In my model, this would have a very slow effect on GDP. How long do you think it would take before language models did >50% of all coordination tasks?
2 standard deviations above the human average with respect to what metric? My whole point is that the metrics people look at in ML papers are not necessarily relevant in the real world and/or the real world impact (say, in revenue generated by the models) is a discontinuous function of these metrics.
I would guess that 2 standard deviations above human average on commonly used language modeling benchmarks is still far from enough for even 10% of coordination tasks, though by this point models could well be generating plenty of revenue.
I think we are close to agreeing with each other on how we expect the future to look. I certainly agree that real world impact is discontinuous in metrics, though I would blame practical matters rather than poor metrics.