I’ve tested this: models are similarly bad at two-hop problems (when was Obama’s wife born?) without explicitly verbalising the intermediate hop (so either: no CoT or dot-of-thought), and much better when they can explicitly verbalise the intermediate hop.
Yeah, I think the architecture makes this tricky for LLMs in one step since the layers that process multi-step reasoning have to be in the right order: “Who is Obama’s wife?” has to be in earlier layer(s) than “When was Michelle Obama born?”. With CoT they both have to be in there but it doesn’t matter where.
I’ve tested this: models are similarly bad at two-hop problems (when was Obama’s wife born?) without explicitly verbalising the intermediate hop (so either: no CoT or dot-of-thought), and much better when they can explicitly verbalise the intermediate hop.
Yeah, I think the architecture makes this tricky for LLMs in one step since the layers that process multi-step reasoning have to be in the right order: “Who is Obama’s wife?” has to be in earlier layer(s) than “When was Michelle Obama born?”. With CoT they both have to be in there but it doesn’t matter where.