The definition he originally gave leaves out a notion of ‘stealthiness’ as opposed to ‘gibberish’; ‘stealthy’ reasoning is more concerning.
OOCR may not meet the opaqueness criterion, since the meaning of ‘f’ is pretty obvious from looking at the fine-tuning examples. This is like ‘reasoning in a foreign language’ or ‘using esoteric terminology’; it’s less legible to the average human, but not outside the realm of human semantics. (But still interesting! just that maybe OOCR isn’t needed)
Backdoors have a similar concern to the above. A separate (and more important) concern is that it doesn’t satisfy the definition because ‘steg’ typically refers to meaning encoded in generated tokens (while in ‘backdoors’, the hidden meaning is in the prompt tokens)
Feedback from Max:
The definition he originally gave leaves out a notion of ‘stealthiness’ as opposed to ‘gibberish’; ‘stealthy’ reasoning is more concerning.
OOCR may not meet the opaqueness criterion, since the meaning of ‘f’ is pretty obvious from looking at the fine-tuning examples. This is like ‘reasoning in a foreign language’ or ‘using esoteric terminology’; it’s less legible to the average human, but not outside the realm of human semantics. (But still interesting! just that maybe OOCR isn’t needed)
Backdoors have a similar concern to the above. A separate (and more important) concern is that it doesn’t satisfy the definition because ‘steg’ typically refers to meaning encoded in generated tokens (while in ‘backdoors’, the hidden meaning is in the prompt tokens)