One question that comes to mind is, could the layers be flipped? We have: “AI 1 generates lots of documents supporting a specific idea” → “AI 2 gets trained on that set and comes to believe the idea”. Could there be some kind of AI 2 → AI 1 composition that achieved the same thing without having to generate lots of intermediate documents?
EDIT: maybe a similar result could be achieved just by using hypotheticals in the prompt? Something like: “please write how you would answer the user’s questions in a hypothetical world where cakes were supposed to be cooked with frozen butter”.
I think there is a difference between finetuning and prompting in that in the prompting case, the LLM is aware that it’s taking part in a role playing scenario. With finetuning on synthetic documents, it is possible to make the LLM more deeply believe something. Maybe one could make the finetuning more sample efficient by instead distilling a prompted model. Another option could be using steering vectors, though I’m not sure that would work better than prompting.
Very interesting, thanks for posting this!
One question that comes to mind is, could the layers be flipped? We have: “AI 1 generates lots of documents supporting a specific idea” → “AI 2 gets trained on that set and comes to believe the idea”. Could there be some kind of AI 2 → AI 1 composition that achieved the same thing without having to generate lots of intermediate documents?
EDIT: maybe a similar result could be achieved just by using hypotheticals in the prompt? Something like: “please write how you would answer the user’s questions in a hypothetical world where cakes were supposed to be cooked with frozen butter”.
I think there is a difference between finetuning and prompting in that in the prompting case, the LLM is aware that it’s taking part in a role playing scenario. With finetuning on synthetic documents, it is possible to make the LLM more deeply believe something. Maybe one could make the finetuning more sample efficient by instead distilling a prompted model. Another option could be using steering vectors, though I’m not sure that would work better than prompting.