Oliver Daniels comments on Lessons from Studying Two-Hop Latent Reasoning

Oliver Daniels 19 Sep 2025 21:41 UTC
1 point
0
maybe this baseline is too high b/c it assumes perfect retrieval of the document. Instead you could just measure the frequency that the model responds w/ an incorrect answer from the same document. If the correct answers are more frequent, this is evidence of multi-hop reasoning
What links here?
- Oliver Daniels's comment on Oliver Daniels-Koch’s Shortform by Oliver Daniels (22 Sep 2025 17:01 UTC; 10 points)