Imho, neuralese (the latent reasoning as in the COCONUT paper) is not steganography either because it doesn’t replace one tokens with other tokens (one symbols with other symbols, one text with other text, covertext with stegotext), neuralese is a continuous space similar to the residual stream. In other words, we don’t say that residual stream is steganographic reasoning, because it is not symbols, it is continuous space, and we can not interpret it fully yet (not even with SAEs as activations from SAEs degrade performance).
Perhaps there’s a definitional disconnect. The conception of steganography we are working with is not constrained to the symbolic level. Rather, it extends to a more speculative scenario at the activation level: where activations correlated with the primary text might simultaneously encode activations for a secondary text, one that is absent from the observable Chain-of-Thought. It’s pretty clear from our introduction.
Imho, neuralese (the latent reasoning as in the COCONUT paper) is not steganography either because it doesn’t replace one tokens with other tokens (one symbols with other symbols, one text with other text, covertext with stegotext), neuralese is a continuous space similar to the residual stream. In other words, we don’t say that residual stream is steganographic reasoning, because it is not symbols, it is continuous space, and we can not interpret it fully yet (not even with SAEs as activations from SAEs degrade performance).
Perhaps there’s a definitional disconnect. The conception of steganography we are working with is not constrained to the symbolic level. Rather, it extends to a more speculative scenario at the activation level: where activations correlated with the primary text might simultaneously encode activations for a secondary text, one that is absent from the observable Chain-of-Thought. It’s pretty clear from our introduction.
Yes, I agree there is some definitional disconnect. I actually just posted my understanding of it at here.
Nice! Thanks for sharing. Will take a look.