”Whose goals are they” --> The Assistant, to use your terminology, which I think is somewhat misleading / bad to use to describe this stage of training since I think at this point the distinction between the Assistant and the LLM is breaking down due to the RL training starting to make the model quite different from “just a text predictor.”
″it seems like you’re imagining some sort of shoggoth-like agency forming” --> No, it’s the same Assistant stuff the whole way through, though again I think that terminology is increasingly misleading over the course of the scenario.
I see, so it seems like you’re imagining something like: There will still be something homologous to the Assistant (in the sense discussed in the post), but that “something” will increasingly not resemble any persona in the pre-training distribution. (Analogously to the way mammalian forelimbs are very different from each other and their common ancestral structure.) Is that right?
Thanks!
”Whose goals are they” --> The Assistant, to use your terminology, which I think is somewhat misleading / bad to use to describe this stage of training since I think at this point the distinction between the Assistant and the LLM is breaking down due to the RL training starting to make the model quite different from “just a text predictor.”
″it seems like you’re imagining some sort of shoggoth-like agency forming” --> No, it’s the same Assistant stuff the whole way through, though again I think that terminology is increasingly misleading over the course of the scenario.
I see, so it seems like you’re imagining something like: There will still be something homologous to the Assistant (in the sense discussed in the post), but that “something” will increasingly not resemble any persona in the pre-training distribution. (Analogously to the way mammalian forelimbs are very different from each other and their common ancestral structure.) Is that right?
Yes exactly thank you.