I think there are many ways that a LLM could have situated awareness about what phase it is in, but I’m not sure if the gradient descent itself is a possibility?
While a NN is running the forward pass without any backprop, it is computing exactly the same thing (usually) that it would be computing if it was running a forward pass before a backwards pass to do a backprop. Otherwise, the backprop can’t really work—if it doesn’t see the ‘real’ forward pass, how does it ‘know’ how to adjust the model parameters to make the model compute a better forward pass next time? So I can’t see how, while running a forward pass, a LLM could ‘know’ if it was about to do a backprop step on a piece of text; for all it knows, maybe someone is running its forward pass just to get out the log-prob at the end, and that is all. (Extreme counterexample: maybe there is a software error and the training code crashes before it finishes running .update() after running .forward(); how could the model ‘know’ that this will happen?) This is true regardless of how many times it has trained on a piece of text.
I’m skeptical that some sort of mismatch from successive gradient steps would be a cue either, because usually you are training at a critical batch size, and for these LLMs, we’d expect them to be updating on millions of tokens simultaneously, at least, and possibly rolling out the updated parameters in a staggered or partial fashion as well, so by the time a gradient update ‘arrives’ from a specific piece of text, that’s now also a gradient update over like a hundred+ books of text as well as itself, diluting any kind of signal.
And wouldn’t it usually train on a piece of text only a few times, at most? And if you are doing multi-epoch training, because you have started to run low on data, usually you train on the same datapoint at very widely separated, by many gradient steps, intervals; the memorization/forgetting dynamics imply you may have forgotten a datapoint entirely by the time it comes around again.
Those sounds like good counterarguments, but I still think there could be enough information there for the LLM to pick it up: It seems plausible to me that a set of weights that is being updated often is different in some measurable way than a set of weights that has already converged. I don’t have proof for this, only intuition. It feels similar to how I can tell if my own movement is well-practiced or not, or if my intuition about a topic is well-founded or not, even withou consciously thinking about how confident I should be based on objective measures.
Yes, a NN can definitely do something like know if it recognizes a datapoint, but it has no access to the backwards step per se. Like take my crashing example: how, while thinking in the forward pass, can it ‘know’ there will be a backward pass when there might be no backward pass (eg because there was a hardware fault)? The forward pass would appear to be identical in every way between the forward pass that happens when there is a backward pass, and when the backward pass doesn’t happen because it crashed. At best, it seems like a NN cannot do more than some sort of probabilistic thing involving gradient hacking, and hope to compute in such a way that if there is a following backward pass, then that will do something odd.
I don’t think this is impossible in principle, based on meta-learning examples or higher-order gradients (see eg my “input-free NN” esoteric NN architecture proposal), but it’s clearly a very difficult, fragile, strange situation where it’s certainly not obvious that a regular LLM would be able to do it, or choose to do so when there are so many other kinds of leakage or situated awareness or steganography possible.
It can’t tell for sure if there will be a backward pass, but it doesn’t need to. Just being able to tell probabilistically that it is currently in a situation that looks like it has recently been trained on implies pretty strongly that it should alter its behavior to look for things that might be training related.
I think there are many ways that a LLM could have situated awareness about what phase it is in, but I’m not sure if the gradient descent itself is a possibility?
While a NN is running the forward pass without any backprop, it is computing exactly the same thing (usually) that it would be computing if it was running a forward pass before a backwards pass to do a backprop. Otherwise, the backprop can’t really work—if it doesn’t see the ‘real’ forward pass, how does it ‘know’ how to adjust the model parameters to make the model compute a better forward pass next time? So I can’t see how, while running a forward pass, a LLM could ‘know’ if it was about to do a backprop step on a piece of text; for all it knows, maybe someone is running its forward pass just to get out the log-prob at the end, and that is all. (Extreme counterexample: maybe there is a software error and the training code crashes before it finishes running
.update()
after running.forward()
; how could the model ‘know’ that this will happen?) This is true regardless of how many times it has trained on a piece of text.I’m skeptical that some sort of mismatch from successive gradient steps would be a cue either, because usually you are training at a critical batch size, and for these LLMs, we’d expect them to be updating on millions of tokens simultaneously, at least, and possibly rolling out the updated parameters in a staggered or partial fashion as well, so by the time a gradient update ‘arrives’ from a specific piece of text, that’s now also a gradient update over like a hundred+ books of text as well as itself, diluting any kind of signal.
And wouldn’t it usually train on a piece of text only a few times, at most? And if you are doing multi-epoch training, because you have started to run low on data, usually you train on the same datapoint at very widely separated, by many gradient steps, intervals; the memorization/forgetting dynamics imply you may have forgotten a datapoint entirely by the time it comes around again.
Those sounds like good counterarguments, but I still think there could be enough information there for the LLM to pick it up: It seems plausible to me that a set of weights that is being updated often is different in some measurable way than a set of weights that has already converged. I don’t have proof for this, only intuition. It feels similar to how I can tell if my own movement is well-practiced or not, or if my intuition about a topic is well-founded or not, even withou consciously thinking about how confident I should be based on objective measures.
Yes, a NN can definitely do something like know if it recognizes a datapoint, but it has no access to the backwards step per se. Like take my crashing example: how, while thinking in the forward pass, can it ‘know’ there will be a backward pass when there might be no backward pass (eg because there was a hardware fault)? The forward pass would appear to be identical in every way between the forward pass that happens when there is a backward pass, and when the backward pass doesn’t happen because it crashed. At best, it seems like a NN cannot do more than some sort of probabilistic thing involving gradient hacking, and hope to compute in such a way that if there is a following backward pass, then that will do something odd.
I don’t think this is impossible in principle, based on meta-learning examples or higher-order gradients (see eg my “input-free NN” esoteric NN architecture proposal), but it’s clearly a very difficult, fragile, strange situation where it’s certainly not obvious that a regular LLM would be able to do it, or choose to do so when there are so many other kinds of leakage or situated awareness or steganography possible.
It can’t tell for sure if there will be a backward pass, but it doesn’t need to. Just being able to tell probabilistically that it is currently in a situation that looks like it has recently been trained on implies pretty strongly that it should alter its behavior to look for things that might be training related.