Yeah we present a variety of such transformations in our experiments on steganography in the subsection Measuring behavioral properties of NLAs. Seems reasonable to explore adding these into the training loop as a preventative measure.
Subhash Kantamneni
Karma: 487
I agree, I generally like the setting of multihop reasoning, and it’s one of the first we looked at to build confidence that NLAs are doing something reasonable. For instance, this result used an early version of NLAs on Haiku 3.5.
It’s worth noting that NLAs struggle with numbers (they’re the type of specific detail that get confabulated). But I’m a little confused, the capital of France is Paris right? So the answer should be (8-5)*2=6 so we shouldn’t see 13 as an intermediate? We also in general might need to run the NLA on more token positions.
We’re fans of the Consistency Lens and shout it out in our acknowledgements!