Combine it with other options and you rapidly build up a decent channel. And those options are myriad.
So here’s the thing: your word choice between “robust plan” or “reasonable plan” or “clear plan” or “smart plan” already matters. It already carries information—I might say, “Well, that″s a reasonable plan but not a robust plan,” or “That seems like a smart plan, but it’s not clear.” There are infinite shades of meaning like this in English, and competence in using English entails command over all these shades.
So the question then is—why would the LLM build up some channel, along angles entirely orthogonal to these existing shades, when these shades are here? You could decide on some encoding between firm / robust / sound / reliable—but these things already have meaning, wouldn’t you just use the meaning already there? There are a bunch of papers on how RL is mostly magnifying existing patterns in the LLMs, and is teaching how to employ skills already latent in the training data, rather than teaching totally new things—if this is so, wouldn’t we expect them to keep using these terms? Or there is work about how it’s genuinely hard for LLMs to reason steganographically, even when we try to teach them to—doesn’t that incline against such a belief.
Or put alternately—I can, within myself, connect “robust” and “reasonable” and so on with a huge internal-to-my-brain- channel, containing enormous bandwidth! If I want to make more information dense private language I could! But in fact, I find myself thinking almost exclusively in terms that make sense to others—when I find myself using a private language, and terms that don’t make sense to others, that’s usually a sign my thoughts are unclear and likely wrong.
At least, those are some of the heuristics you’d invoke when inclining the other way. Empiricism will show us which is right :)
So here’s the thing: your word choice between “robust plan” or “reasonable plan” or “clear plan” or “smart plan” already matters. It already carries information—I might say, “Well, that″s a reasonable plan but not a robust plan,” or “That seems like a smart plan, but it’s not clear.” There are infinite shades of meaning like this in English, and competence in using English entails command over all these shades.
So the question then is—why would the LLM build up some channel, along angles entirely orthogonal to these existing shades, when these shades are here? You could decide on some encoding between firm / robust / sound / reliable—but these things already have meaning, wouldn’t you just use the meaning already there? There are a bunch of papers on how RL is mostly magnifying existing patterns in the LLMs, and is teaching how to employ skills already latent in the training data, rather than teaching totally new things—if this is so, wouldn’t we expect them to keep using these terms? Or there is work about how it’s genuinely hard for LLMs to reason steganographically, even when we try to teach them to—doesn’t that incline against such a belief.
Or put alternately—I can, within myself, connect “robust” and “reasonable” and so on with a huge internal-to-my-brain- channel, containing enormous bandwidth! If I want to make more information dense private language I could! But in fact, I find myself thinking almost exclusively in terms that make sense to others—when I find myself using a private language, and terms that don’t make sense to others, that’s usually a sign my thoughts are unclear and likely wrong.
At least, those are some of the heuristics you’d invoke when inclining the other way. Empiricism will show us which is right :)