That’s an empirical question. Perhaps it could be operationalized as “can you have some linear classifier in early or middle layer residual space which predicts that the next token will be output in the LLM’s voice in an angry tone”—logic being that since once some feature is linearly separable in early layers like that, it is trivial for the LLM to use that feature to guide its output. My guess is that the answer to that question is “yes”.
Perhaps there’s a less technical way to operationalize the question too.
But it’s a different thing. Timmy is learning words to describe what he feels. An LLM has to learn the words while (in your account) learning to feel at the same time, building these internal correlata/predictors. These are different learning tasks and the former is easier.
That’s an empirical question. Perhaps it could be operationalized as “can you have some linear classifier in early or middle layer residual space which predicts that the next token will be output in the LLM’s voice in an angry tone”—logic being that since once some feature is linearly separable in early layers like that, it is trivial for the LLM to use that feature to guide its output. My guess is that the answer to that question is “yes”.
Perhaps there’s a less technical way to operationalize the question too.
But it’s a different thing. Timmy is learning words to describe what he feels. An LLM has to learn the words while (in your account) learning to feel at the same time, building these internal correlata/predictors. These are different learning tasks and the former is easier.