For example, if you ask an LLM a question like “Who was the sister of the mother of the uncle of … X?”, every step of this necessarily requires at least one layer in the model and an LLM can’t[1] do this without CoT if it doesn’t have enough layers.
It’s harder to construct examples that can’t be written to chain of thought, but a question in the form “What else did you think the last time you thought about X?” would require this (or “What did you think about our conversation about X’s mom?”), and CoT doesn’t help since reading its own outputs and making assumptions from it isn’t introspection[2].
It’s unclear how much of a limitation this really is, since in many cases CoT could reduce the complexity of the query and it’s unclear how well humans can do this too, but there’s plausibly more thought going on in our heads than what shows up in our internal dialogs[3].
I guess technically an LLM could parallelize this question by considering the answer for every possible X and every possible path through the relationship graph, but that model would be implausibly large.
Especially since some people claim not to think in words at all. Also some mathemeticians claim to be able to imagine complex geometry and reason about it in their heads.
I think this is more about causal masking (which we do on purpose for the reasons you mention)?
I was thinking about how LLMs are limited in the sequential reasoning they can do “in their head”, and once it’s not in their head, it’s not really introspection.
For example, if you ask an LLM a question like “Who was the sister of the mother of the uncle of … X?”, every step of this necessarily requires at least one layer in the model and an LLM can’t[1] do this without CoT if it doesn’t have enough layers.
It’s harder to construct examples that can’t be written to chain of thought, but a question in the form “What else did you think the last time you thought about X?” would require this (or “What did you think about our conversation about X’s mom?”), and CoT doesn’t help since reading its own outputs and making assumptions from it isn’t introspection[2].
It’s unclear how much of a limitation this really is, since in many cases CoT could reduce the complexity of the query and it’s unclear how well humans can do this too, but there’s plausibly more thought going on in our heads than what shows up in our internal dialogs[3].
I guess technically an LLM could parallelize this question by considering the answer for every possible X and every possible path through the relationship graph, but that model would be implausibly large.
I can read a diary and say “I must have felt sad when I wrote that”, but that’s not the same as remembering how I felt when I wrote it.
Especially since some people claim not to think in words at all. Also some mathemeticians claim to be able to imagine complex geometry and reason about it in their heads.