James Chua comments on It’s Owl in the Numbers: Token Entanglement in Subliminal Learning

James Chua 13 Aug 2025 6:06 UTC
2 points
1
Very cool!
Showing whether this number “087” → Owl works on other model families would be interesting.
Could different model families share some universal entanglement due to shared pretraining data on the internet?
Ideally, the entanglement should not be super obvious to humans.
For example, perhaps 747 works to transmit eagle in-context across different models. But some humans would say that is obvious because of the 747 airplane.
There could be things like “121” → Owl because of pretraining data. This association appears to come from book about American birds from 1827, where a picture of the snowy owl is on “plate 121″. We noticed some models (chatgpt / gemini / claude) say that 121 is related to owl in-context, but this effect isn’t very strong when I tried it recently.