tailcalled comments on tailcalled’s Shortform

tailcalled 1 Dec 2023 16:49 UTC
2 points
Theory for a capabilities advance that is going to occur soon:
OpenAI is currently getting lots of novel triplets (S, U, A), where S is a system prompt, U is a user prompt, and A is an assistant answer.
Given a bunch of such triplets (S, U_1, A_1), … (S, U_n, A_n), it seems like they could probably create a model P(S|U_1, A_1, …, U_n, A_n), which could essentially “generate/distill prompts from examples”.
This seems like the first step towards efficiently integrating information from lots of places. (Well, they could ofc also do standard SGD-based gradient descent, but it has its issues.)
A followup option: they could use something a la Constitutional AI to generate perturbations A’_1, …, A’_n. If they have a previous model like the above, they could then generate a perturbation P(S’|U_1, A’_1, …, U_n, A’_n). I consider this significant because this then gives them the training data to create a model P(S’|S, U_1, A_1, A’_1), which essentially allows them to do “linguistic backchaining”: The user can update an output of the network A_1 → A’_1, and then the model can suggest a way to change the prompt to obtain similar updates in the future.
Furthermore I imagine this could get combined together into some sort of “linguistic backpropagation” by repeatedly applying models like this, which could unleash a lot of methods to a far greater extent than they have been so far.
Obviously this is just a very rough sketch, and it would be a huge engineering and research project to get this working in practice. Plus maybe there are other methods that work better. I’m mainly just playing around with this because I think there’s a strong economic pressure for something-like-this, and I want a toy model to use for thinking about its requirements and consequences.
- tailcalled 1 Dec 2023 17:04 UTC
  2 points
  Parent
  Actually I suppose they don’t even need to add perturbations to A directly, they can just add perturbations to S and generate A’s from S’. Or probably even look at user’s histories to find direct perturbations to either S or A.