Suppose a model learns “A->B” and “B->C” as separate facts. These get stored in the weights, probably somewhere across the feedforward layers. They can’t be combined unless both facts are loaded into the residual stream/token stream at the same time, which might not happen. And even if that is the case, the model won’t remember “A->C” as a standalone fact in the future, it has to re-compute it every time.
Sure. But more than the immediate, associative leaps, I think I’m interested in their ability to sample concepts across very different domains and find connections whether that is done deliberately or randomly. Though with humans, the ideas that plague our subconscious are tied to our persistent, internal questions.
Do you mean something like:
Suppose a model learns “A->B” and “B->C” as separate facts. These get stored in the weights, probably somewhere across the feedforward layers. They can’t be combined unless both facts are loaded into the residual stream/token stream at the same time, which might not happen. And even if that is the case, the model won’t remember “A->C” as a standalone fact in the future, it has to re-compute it every time.
Sure. But more than the immediate, associative leaps, I think I’m interested in their ability to sample concepts across very different domains and find connections whether that is done deliberately or randomly. Though with humans, the ideas that plague our subconscious are tied to our persistent, internal questions.