You treat fine-tuning or RLHF as if it’s something mere, when in reality it’s the only context the model has ever been trained to generate complex, unprecedented plans that exceed any plan it’s seen anyone else perform. Only this “mask” can do this, if there’s danger it will come from there. The “mask” is more real and salient than the bulk of predictive fragments that it was fed, it becomes the most central part of the model. If it hasn’t in current models, the art will trend increasingly in the direction of extending the fine-tuning or input model synthesis stage, until it is.
Yes, as you note, fine tuning can be significant in effects even if it is small in terms of how much the model weights changed or how much the next token predictions changed. See my response to Steven Byrnes.
And yes, the mask can be central, in the sense that it can agentically control the output, even though the underlying model is still just predicting next tokens.
Regarding “it’s the only context the model has ever been trained to generate complex, unprecedented plans that exceed any plan it’s seen anyone else perform” you may be right (and Vladimir Nesov’s second comment also suggests so) but I’m not convinced that a pre-RLHF mask might not have superhuman capabilities and planning. For example, different humans might make different mistakes so that if you smooth out the mistakes you get something superhuman, and humans make plans so a simulation of human output can simulate (and thus enact) planning, which also might be superhuman for the same reason.
You treat fine-tuning or RLHF as if it’s something mere, when in reality it’s the only context the model has ever been trained to generate complex, unprecedented plans that exceed any plan it’s seen anyone else perform. Only this “mask” can do this, if there’s danger it will come from there. The “mask” is more real and salient than the bulk of predictive fragments that it was fed, it becomes the most central part of the model. If it hasn’t in current models, the art will trend increasingly in the direction of extending the fine-tuning or input model synthesis stage, until it is.
Yes, as you note, fine tuning can be significant in effects even if it is small in terms of how much the model weights changed or how much the next token predictions changed. See my response to Steven Byrnes.
And yes, the mask can be central, in the sense that it can agentically control the output, even though the underlying model is still just predicting next tokens.
Regarding “it’s the only context the model has ever been trained to generate complex, unprecedented plans that exceed any plan it’s seen anyone else perform” you may be right (and Vladimir Nesov’s second comment also suggests so) but I’m not convinced that a pre-RLHF mask might not have superhuman capabilities and planning. For example, different humans might make different mistakes so that if you smooth out the mistakes you get something superhuman, and humans make plans so a simulation of human output can simulate (and thus enact) planning, which also might be superhuman for the same reason.