I wonder if there’s some accidental steganography—if you use an LLM to rewrite the shorter scenario, and maybe it has “this is a test” features active while doing that, nudging the text towards sounding like a test.
Hm. I’m unsure why would LLM do stenography in this case.
The subliminal owls result? I know this isn’t distillation but there could be the same effect on prompts.
Might be
I wonder if there’s some accidental steganography—if you use an LLM to rewrite the shorter scenario, and maybe it has “this is a test” features active while doing that, nudging the text towards sounding like a test.
Hm. I’m unsure why would LLM do stenography in this case.
The subliminal owls result? I know this isn’t distillation but there could be the same effect on prompts.
Might be