top level post, please. It would be quite hard for this to keep up capabilities wise, but if it works, I’d be very excited about pre-ASI alignment having gotten easier for a while.
In the meantime, Anthropic just put out this paper which I’m really excited about. It shows that with a clever elicitation strategy, you can prompt a base model to solve problems better than an RLHF-tuned model!
Just remembered this comment—my recent paper is more or less the “top level post” I was talking about!
Naively, it is in fact hard for prompt optimization to keep up capabilities-wise. I think “replacing RL entirely” may be too ambitious, but it’s possible to compromise and do some combination of latent learning and legible learning.
See also my recent shortform for one version of this less-ambitious plan. Even if LLM post-training relies heavily on RL, we should try to do continual learning (i.e. user customization and “learning on the job”) with prompts and code. This is an easier ask, since the labs are already basically doing this!
top level post, please. It would be quite hard for this to keep up capabilities wise, but if it works, I’d be very excited about pre-ASI alignment having gotten easier for a while.
I’m working on a top-level post!
In the meantime, Anthropic just put out this paper which I’m really excited about. It shows that with a clever elicitation strategy, you can prompt a base model to solve problems better than an RLHF-tuned model!
Just remembered this comment—my recent paper is more or less the “top level post” I was talking about!
Naively, it is in fact hard for prompt optimization to keep up capabilities-wise. I think “replacing RL entirely” may be too ambitious, but it’s possible to compromise and do some combination of latent learning and legible learning.
See also my recent shortform for one version of this less-ambitious plan. Even if LLM post-training relies heavily on RL, we should try to do continual learning (i.e. user customization and “learning on the job”) with prompts and code. This is an easier ask, since the labs are already basically doing this!