I think that steganography is potentially a problem with language models that are in some sort of model-to-model communication. For a simple and commonplace example, using the a one-token-prediction model multiple times to produce many tokens in a row. If a model with strategic foresight knows it is being used in this way, it potentially allows the model to pass hidden information to its future self via use of certain tokens vs other tokens.
Another scenario might be chains of similar models working together in a bureaucracy.
My idea for countering this type of steganography would be to generate many versions of the output with synonym substitution, and/or using language models fine-tuned to specifically reword paragraphs while preserving content/meaning. Similarly, if are worried about images containing steganography, but you have an image generator model (separate from the original image maker, that you believe to not be in collusion with the original model), you can have the second model make subtle variants of the original image which should be expected to lack the hidden info.
Anti-steganography idea for language models:
I think that steganography is potentially a problem with language models that are in some sort of model-to-model communication. For a simple and commonplace example, using the a one-token-prediction model multiple times to produce many tokens in a row. If a model with strategic foresight knows it is being used in this way, it potentially allows the model to pass hidden information to its future self via use of certain tokens vs other tokens.
Another scenario might be chains of similar models working together in a bureaucracy.
My idea for countering this type of steganography would be to generate many versions of the output with synonym substitution, and/or using language models fine-tuned to specifically reword paragraphs while preserving content/meaning. Similarly, if are worried about images containing steganography, but you have an image generator model (separate from the original image maker, that you believe to not be in collusion with the original model), you can have the second model make subtle variants of the original image which should be expected to lack the hidden info.