GPT-3 is slightly too expensive for many of the use-cases that I am interested in. This problem is made even worse by the fact that one of the basic techniques I normally use in procedural generation is “generate 100 of something and then pick the best one”.
It’s worth noting here that in a sense, GPT-3 isn’t expensive enough if you are trading so much compute to get the necessary quality. You might well be better off with a GPT-4 which cost 10x as much. This is because the best sample out of 100 is only a bit better than the best out of 50, or the best out of 10, or the average sample, but generating 100 samples costs 100x more. If GPT-4 cost up to 100x more to run, then it might still be a win.
Particularly if you include the cost of screening 100 samples and how many workflows that eliminates… Many absolute technical metrics have hard to understand nonlinear translations to enduser utility. Below a certain apparently arbitrary % as defined by accuracy or word error rate or perplexity or whatever, a tool may be effectively useless; and then as soon as it crests it, suddenly it becomes useful for ordinary people. (Speech transcription & machine translation are two examples where I’ve noticed this.) It could be worth paying much more if it gets you to a level of reliability or quality where you can use it by default, or without supervision, or for entirely new tasks.
I think is a valid point, however in the Artbreeder use-case, generating 100 of something is actually part of the utility, since looking over a bunch of variants and deciding which one I like best is part of the process.
Abstractly, when exploring a high-dimensional space (pictures of cats), it might be more useful to have a lot of different directions to choose from than 2 “much better” directions if the objective function is an external black-box because it allows the black box to transmit “more bits of information” at each step.
Which is the right choice depends on how well we think theoretically it is possible for the Generator to model the black-box utility function. In the case of Artbreeder, each user has a highly individualized utility function whereas the site can at best optimize for “pictures people generally like”.
In the particular use-case for GPT-3 I have in mind (generating funny skits), I do think there is in fact “room for improvement” even before attempting to accommodate for the fact that different people have different senses of humor. So in that sense I would prefer a more-expensive GPT-4.
It’s worth noting here that in a sense, GPT-3 isn’t expensive enough if you are trading so much compute to get the necessary quality. You might well be better off with a GPT-4 which cost 10x as much. This is because the best sample out of 100 is only a bit better than the best out of 50, or the best out of 10, or the average sample, but generating 100 samples costs 100x more. If GPT-4 cost up to 100x more to run, then it might still be a win.
Particularly if you include the cost of screening 100 samples and how many workflows that eliminates… Many absolute technical metrics have hard to understand nonlinear translations to enduser utility. Below a certain apparently arbitrary % as defined by accuracy or word error rate or perplexity or whatever, a tool may be effectively useless; and then as soon as it crests it, suddenly it becomes useful for ordinary people. (Speech transcription & machine translation are two examples where I’ve noticed this.) It could be worth paying much more if it gets you to a level of reliability or quality where you can use it by default, or without supervision, or for entirely new tasks.
I think is a valid point, however in the Artbreeder use-case, generating 100 of something is actually part of the utility, since looking over a bunch of variants and deciding which one I like best is part of the process.
Abstractly, when exploring a high-dimensional space (pictures of cats), it might be more useful to have a lot of different directions to choose from than 2 “much better” directions if the objective function is an external black-box because it allows the black box to transmit “more bits of information” at each step.
Which is the right choice depends on how well we think theoretically it is possible for the Generator to model the black-box utility function. In the case of Artbreeder, each user has a highly individualized utility function whereas the site can at best optimize for “pictures people generally like”.
In the particular use-case for GPT-3 I have in mind (generating funny skits), I do think there is in fact “room for improvement” even before attempting to accommodate for the fact that different people have different senses of humor. So in that sense I would prefer a more-expensive GPT-4.