Do you have an explanation for there is a bridge-entity representation mismatch for synthetic facts but not for real ones? What in “real” training allows LLMs to learn a common representation of input and output entities? Can you emulate that with additional fine-tuning on more synthetic documents?
We don’t have a good explanation. One idea could be that you need bridge entities to be somehow more internalized to support latent two-hop reasoning, e.g. they need to occur in many facts as first and as second entities or maybe they need to occur in other two-hop questions. The Grokked transformers paper has some results linking the ratio of e2 and e3 to two-hop performance (in toy grokking settings).
Do you have an explanation for there is a bridge-entity representation mismatch for synthetic facts but not for real ones? What in “real” training allows LLMs to learn a common representation of input and output entities? Can you emulate that with additional fine-tuning on more synthetic documents?
We don’t have a good explanation. One idea could be that you need bridge entities to be somehow more internalized to support latent two-hop reasoning, e.g. they need to occur in many facts as first and as second entities or maybe they need to occur in other two-hop questions. The Grokked transformers paper has some results linking the ratio of e2 and e3 to two-hop performance (in toy grokking settings).