top level post, please. It would be quite hard for this to keep up capabilities wise, but if it works, I’d be very excited about pre-ASI alignment having gotten easier for a while.
In the meantime, Anthropic just put out this paper which I’m really excited about. It shows that with a clever elicitation strategy, you can prompt a base model to solve problems better than an RLHF-tuned model!
top level post, please. It would be quite hard for this to keep up capabilities wise, but if it works, I’d be very excited about pre-ASI alignment having gotten easier for a while.
I’m working on a top-level post!
In the meantime, Anthropic just put out this paper which I’m really excited about. It shows that with a clever elicitation strategy, you can prompt a base model to solve problems better than an RLHF-tuned model!