E.G. Blee-Goldman comments on E.G. Blee-Goldman’s Shortform

E.G. Blee-Goldman 7 Aug 2025 0:04 UTC
2 points
1
I found this paper by Amir Zur and others really interesting: It’s Owl in the Numbers:
Token Entanglement in Subliminal Learning where they try to explain subliminal learning (the notion that “language model fine-tuned on seemingly meaningless data from a teacher model acquires the teacher’s hidden behaviors.”)

The researchers found that certain concepts like “owl” and “087” can become entangled during training (the probability of one increases the probability of the other.)
Fascinating and would be curious to hear what others think!
- ProgramCrafter 7 Aug 2025 17:38 UTC
  2 points
  0
  Parent
  You may be interested in this discussion then, and also the article you mention is posted on LW too.
  - E.G. Blee-Goldman 7 Aug 2025 22:03 UTC
    1 point
    0
    Parent
    Thanks, I missed that!