Kei Nishimura-Gasparian comments on Fabien’s Shortform

Kei Nishimura-Gasparian 27 Mar 2025 4:20 UTC
1 point
0
Do you think this means that if Claude 3.7 Sonnet does in fact use steganography then your earlier experiment wouldn’t have found evidence for it, since you didn’t fine-tune for long enough for the model to learn the scheme?

Of course this experiment also suggests one should lower their likelihood of Claude 3.7 Sonnet using steganography at all.
- Fabien Roger 29 Mar 2025 15:37 UTC
  2 points
  0
  Parent
  I think this would have been plausible if distillation of the original scratchpads resulted in significantly worse performance compared to the model I distilled from. But the gap is small, so I think we can exclude this possibility.