Should we expect latent reasoning/neuralese to replace legible CoT in LLMs soon? I’d assume neuralese would have bad implications for evals/monitoring work, so I’m wondering what work people are planning to prioritize in anticipation of neuralese. Or, are there reasons why neuralese may not become the default (or why it’s not a big deal)?
The good news is that it’s really hard to train a model to use neuralese. Right now the way we train reasoning models is to first train them to reproduce random text (base models) and then we make minor tweaks to turn them into reasoning models (RL). The base model step only works if you have the exact output that you want the LLM to produce, and we don’t have that for neuralese. The RL step needs a model that’s already reasonably good and doesn’t work well for base models (if the model isn’t close to right answers, it’s hard for RL to get good feedback).
You can take a base model and then alter it to produce neuralese, but it doesn’t work very well since neuralese is out of distribution for the original training.
It’s also unclear if neuralese is even helpful from a performance perspective, since forcing outputs to be discrete helps them stay in-distribution.
That said, people are definitely trying to do this and it’s hard to predict what the next advance will be.
I’ve seen their prototype, and it definitely works (as far as producing reasonable text outputs while making non-trivial use of >100 continuous latents), but whether it actually amounts to anything remains to be seen.
Since OP contrasted neuralese with “legible CoT”, I’d like to add that while the “hard to train” may be true for neuralese, it doesn’t apply to o3-style Thinkish. Hopefully optimization pressures don’t favor that too much.
Should we expect latent reasoning/neuralese to replace legible CoT in LLMs soon? I’d assume neuralese would have bad implications for evals/monitoring work, so I’m wondering what work people are planning to prioritize in anticipation of neuralese. Or, are there reasons why neuralese may not become the default (or why it’s not a big deal)?
The good news is that it’s really hard to train a model to use neuralese. Right now the way we train reasoning models is to first train them to reproduce random text (base models) and then we make minor tweaks to turn them into reasoning models (RL). The base model step only works if you have the exact output that you want the LLM to produce, and we don’t have that for neuralese. The RL step needs a model that’s already reasonably good and doesn’t work well for base models (if the model isn’t close to right answers, it’s hard for RL to get good feedback).
You can take a base model and then alter it to produce neuralese, but it doesn’t work very well since neuralese is out of distribution for the original training.
It’s also unclear if neuralese is even helpful from a performance perspective, since forcing outputs to be discrete helps them stay in-distribution.
That said, people are definitely trying to do this and it’s hard to predict what the next advance will be.
Someone I know claims to have found a way to directly pretrain neuralese models: https://aklein.bearblog.dev/zebra/
I’ve seen their prototype, and it definitely works (as far as producing reasonable text outputs while making non-trivial use of >100 continuous latents), but whether it actually amounts to anything remains to be seen.
Since OP contrasted neuralese with “legible CoT”, I’d like to add that while the “hard to train” may be true for neuralese, it doesn’t apply to o3-style Thinkish. Hopefully optimization pressures don’t favor that too much.
I was largely thinking of Coconut, which I don’t think forces models to produce OOD outputs, but this is also true