janczarknurek comments on Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data

janczarknurek 10 Aug 2025 20:14 UTC
1 point
0
Very cool paper!
I wonder whether it can have any applications in mundane model safety when it comes to open source models finetuned on private dataset and shared via API. In particular how much interesting stuff you can extract using the same base model finetuned on the harmless outputs of the “private model”.