J Bostock comments on Lessons from Studying Two-Hop Latent Reasoning

J Bostock 11 Sep 2025 23:31 UTC
LW: 6 AF: 2
0
AF
RE part 6:
I think there’s a more intuitive/abstract framing here. If a model has only seen e_2 with respect to two different facts, it probably won’t have generated an abstraction for e_2 in its world model at all. An abstraction is mostly useful as a hub of different inferences, like in the old blegg/rube diagram.
Something which has come up in pretraining will already be an abstraction with an easy-to-reach-for handle that the model can pull.
Might be testable by fine-tuning on only some of (or some pairs of) the spokes of a blegg/rube diagram, to see whether the final spoke-pairs fill in.
I.e.
“This object is round, so it’s a blegg, so it’s blue”
“This object is smooth, so it’s a blegg, so it’s round”
“This object is smooth, so it’s a blegg, so it’s bouncy”
“This object is round, is it bouncy?”
Something like that might cause “blegg” to be bound up and assembled into an abstraction in the AI, with a single representation.
Overall I consider this work to be weak evidence in favour of multi-step reasoning being an issue, since the latter parts show that it definitely can occur (just not if both facts are fine-tuned separately)
- Tomek Korbak 12 Sep 2025 12:04 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Yeah, it seems plausible that entity being activated across different context is necessary for it being represented saliently enough to facilitate multi-hop reasoning. The Grokked transformer paper has some results linking the ratio of e2 and e3 to two-hop performance (in toy settings).