What would be the competing hypothesis? Legible english can’t be compute optimal, and already starts to actively degrade in current models absent countermeasures. My understanding is that even things like Cache2Cache already provide a benefit over exchanging legible english text: https://arxiv.org/abs/2510.03215
Compared with text communication, C2C utilizes the deep, specialized semantics from both models, while avoiding explicit intermediate text generation. Experiments show that C2C achieves 8.5-10.5% higher average accuracy than individual models. It further outperforms the text communication paradigm by approximately 3.0-5.0%, while delivering an average 2.0x speedup in latency.
Oh I agree, I was trying to figure out why CoT would be assumed superior to neuralese and one position could be something about “the human prior makes it easier to reason in cot than latent space”. I’ll admit I’m reaching here though, I’d like to understand the steelman for why CoT would be superior to reasoning in latent space.
The counterargument against continous tokens being passed forwards is that if you want to use neuralese, you have to give up sampling, since the big idea of latent reasoning is to not pass through the random discretization of sampling a token. But random discretization is itself powerful, especially with the possibility of a useful bias. If you give it up, the model becomes deterministic, so it can’t use Best of N. If Best of N or tree search on chains of thoughts is really important, either in training or in deployment, that is something that is not really compatible with the latent paradigm, in addition to the difficulty of training data.
The argument against semantic drift/Thinkish is extremely weak, and we should expect semantic drift when training with self play without countermeasures.
Yeah looks like it’s vectors as some kind of an autoencoder between different text models at first glance, not using it as an intermediate state to assist thinking in a single text model? Or something; the application list is underwhelming
As a general LLM communication paradigm, C2C can be expanded to various fields. Some poten-
tial scenarios include: (1) Privacy-aware cloud–edge collaboration: a cloud-scale model can transmit
curated KV-Cache segments to an edge model to boost capability without emitting raw text, reduc-
ing bandwidth and limiting content exposure. (2) Integration with current inference acceleration
method: use C2C to enhance speculative decoding and enable token-level routing across heteroge-
neous models for lower latency and cost. (3) Multimodal integration: align and fuse caches among language reasoning LLMs, vision–language models (VLMs), and vision–language–action (VLA)
policies so that linguistic and visual context can drive more accurate actions.
Why does the application list matter? I still feel like I don’t understand the position of “maybe it’s not more efficient for the model to do reasoning within a several thousand dimensional vector as opposed to human legible english.” My understanding of the arguments for neuralese is that because this is the case, there is eventually growing performance incentive to do this.
What would be the competing hypothesis? Legible english can’t be compute optimal, and already starts to actively degrade in current models absent countermeasures. My understanding is that even things like Cache2Cache already provide a benefit over exchanging legible english text: https://arxiv.org/abs/2510.03215
Note that an illegible CoT (Thinkish) is different from reasoning in latent space (Neuralese).
Oh I agree, I was trying to figure out why CoT would be assumed superior to neuralese and one position could be something about “the human prior makes it easier to reason in cot than latent space”. I’ll admit I’m reaching here though, I’d like to understand the steelman for why CoT would be superior to reasoning in latent space.
The counterargument against continous tokens being passed forwards is that if you want to use neuralese, you have to give up sampling, since the big idea of latent reasoning is to not pass through the random discretization of sampling a token. But random discretization is itself powerful, especially with the possibility of a useful bias. If you give it up, the model becomes deterministic, so it can’t use Best of N. If Best of N or tree search on chains of thoughts is really important, either in training or in deployment, that is something that is not really compatible with the latent paradigm, in addition to the difficulty of training data.
The argument against semantic drift/Thinkish is extremely weak, and we should expect semantic drift when training with self play without countermeasures.
Yeah looks like it’s vectors as some kind of an autoencoder between different text models at first glance, not using it as an intermediate state to assist thinking in a single text model? Or something; the application list is underwhelming
Why does the application list matter? I still feel like I don’t understand the position of “maybe it’s not more efficient for the model to do reasoning within a several thousand dimensional vector as opposed to human legible english.” My understanding of the arguments for neuralese is that because this is the case, there is eventually growing performance incentive to do this.