Currently, many people are trying to figure out how to prompt GPT-3 into doing what they want—in other words, how to align GPT-3 with their desires. GPT-3 may be capable of the task, but that doesn’t mean it will do it (potential example).This suggests that alignment will soon be a bottleneck on our ability to get value from large language models.
Certainly GPT-3 isn’t perfectly capable yet. The author thinks that in the immediate future the major bottleneck will still be its capability, but we have a clear story for how to improve its capabilities: just scale up the model and data even more. Alignment on the other hand is much harder: we don’t know how to <@translate@>(@Alignment as Translation@) the tasks we want into a format that will cause GPT-3 to “try” to accomplish that task.
As a result, in the future we might expect a lot more work to go into prompt design (or whatever becomes the next way to direct language models at specific tasks). In addition, once GPT is better than humans (at least in some domains), alignment in those domains will be particularly difficult, as it is unclear how you would get a system trained to mimic humans <@to do better than humans@>(@The easy goal inference problem is still hard@).
Planned opinion:
The general point of this post seems clearly correct and worth pointing out. I’m looking forward to the work we’ll see in the future figuring out how to apply these broad and general methods to real tasks in a reliable way.
Planned summary for the Alignment Newsletter:
Planned opinion:
LGTM