Steganographic reasoning is one hidden task (or tasks) done at the same time as an overt task, but that implies two optimizers operating at the same time. A covert optimizer uses tokens of the overt (benign) optimizer to conduct its reasoning or actions. This is similar to parasites observed in insects, fish, birds. Perhaps the closest to steganography is brood parasitism, a well-known behavior of animals to rely on other species to raise their young.
My first reaction to this was that it sounded over-elaborate. Then I remembered that base models are trained on the entire Internet, and that contains plenty of examples of texts that is the product of more then one agent: wikipedia articles edited by many people, academic papers written by teams of people, text written by an author and then edited by an editor, text from an author who is internally simulating another persona of one of their characters, but is also subtly foreshadowing things that the author knows but the character doesn’t…
So it’s very obvious that LLMs will have circuitry for simulating more than one agent (optimzer) at once that have different goals, and then intermixing their actions in sophisticated ways. Anything that didn’t have that capacity couldn’t get low perplexity on many parts of the training set.
So I think your assumption here is entirely reasonable, but I think likening it to parasitism might be too colorful an analogy, one that actively dissuades your readers. Just liken it to text written by a group or committee, where some members of the group have a hidden agenda.
No, the point was that there are two agents, not one agent or LLM, — though I agree with your idea about LLMs trained on the corpus which was edited by several agents (your committee analogy). The point in the post was to present another scenario, — which I’ve not yet seen anywhere else, — where steganographic reasoning might happen. Usually, we speak about one agent that encodes some of its reasoning while presenting a benign one. And in the case of parasitic AIs, a parasite uses text of another AI. Example here might be a grammar checker AI that inserts hidden messages into the text this AI edits. Or a code agent that creates commits in a Git repository while covertly inserting some secrets or hidden reasoning inside commit messages.
I agree with your point about multiple agents being simulated at the same time, and find it plausible. My comment was just that I initially reflexively bounced of phrasing this as “parasitism”, so I was wondering if there might be a less colorful but more persuasive way for you explain the idea to people.
My first reaction to this was that it sounded over-elaborate. Then I remembered that base models are trained on the entire Internet, and that contains plenty of examples of texts that is the product of more then one agent: wikipedia articles edited by many people, academic papers written by teams of people, text written by an author and then edited by an editor, text from an author who is internally simulating another persona of one of their characters, but is also subtly foreshadowing things that the author knows but the character doesn’t…
So it’s very obvious that LLMs will have circuitry for simulating more than one agent (optimzer) at once that have different goals, and then intermixing their actions in sophisticated ways. Anything that didn’t have that capacity couldn’t get low perplexity on many parts of the training set.
So I think your assumption here is entirely reasonable, but I think likening it to parasitism might be too colorful an analogy, one that actively dissuades your readers. Just liken it to text written by a group or committee, where some members of the group have a hidden agenda.
No, the point was that there are two agents, not one agent or LLM, — though I agree with your idea about LLMs trained on the corpus which was edited by several agents (your committee analogy). The point in the post was to present another scenario, — which I’ve not yet seen anywhere else, — where steganographic reasoning might happen. Usually, we speak about one agent that encodes some of its reasoning while presenting a benign one. And in the case of parasitic AIs, a parasite uses text of another AI. Example here might be a grammar checker AI that inserts hidden messages into the text this AI edits. Or a code agent that creates commits in a Git repository while covertly inserting some secrets or hidden reasoning inside commit messages.
I agree with your point about multiple agents being simulated at the same time, and find it plausible. My comment was just that I initially reflexively bounced of phrasing this as “parasitism”, so I was wondering if there might be a less colorful but more persuasive way for you explain the idea to people.