Strings of numbers are shown to transmit a fondness for owls. Numbers have no semantic content related to owls. This seems to point to ‘tokens containing much more information than their semantic content’, doesn’t it?
I don’t understand how a token alone can contain more than ~17 bits of information. I think that you should be talking about associations that tokens evoke in LLMs and don’t evoke in other LLMs (e.g. if they had wildly different initiations, like GPT-4.1 and Qwen).
Neuralese is supposed to let the model process FAR more information per step than just the CoT and the newly generated token.
You’re disagreeing with a claim I didn’t intend to make.
I was unclear in my language and shouldn’t have used ‘contains’. Sorry! Maybe ‘relaying’ would have avoided this confusion.
I think you’re not objecting to the broader point other than by saying ‘neuralese requires very high bandwidth’, but LLMs have a lot of potential associations that can be made in processing a single token (which is, potentially, an absolute ton of bandwidth).
@StanislavKrym can you explain your disagree vote?
Strings of numbers are shown to transmit a fondness for owls. Numbers have no semantic content related to owls. This seems to point to ‘tokens containing much more information than their semantic content’, doesn’t it?
I don’t understand how a token alone can contain more than ~17 bits of information. I think that you should be talking about associations that tokens evoke in LLMs and don’t evoke in other LLMs (e.g. if they had wildly different initiations, like GPT-4.1 and Qwen).
Neuralese is supposed to let the model process FAR more information per step than just the CoT and the newly generated token.
You’re disagreeing with a claim I didn’t intend to make.
I was unclear in my language and shouldn’t have used ‘contains’. Sorry! Maybe ‘relaying’ would have avoided this confusion.
I think you’re not objecting to the broader point other than by saying ‘neuralese requires very high bandwidth’, but LLMs have a lot of potential associations that can be made in processing a single token (which is, potentially, an absolute ton of bandwidth).