janczarknurek

Karma: 5

janczarknurek 10 Aug 2025 20:14 UTC
1 point
0
on: Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
Very cool paper!
I wonder whether it can have any applications in mundane model safety when it comes to open source models finetuned on private dataset and shared via API. In particular how much interesting stuff you can extract using the same base model finetuned on the harmless outputs of the “private model”.

janczarknurek 25 Apr 2025 19:12 UTC
4 points
−1
in reply to: Viliam’s comment on: This prompt (sometimes) makes ChatGPT think about terrorist organisations
That was also my idea at first but then we have the Wagner group one so this is probably a false lead.

janczarknurek 20 Apr 2025 12:02 UTC
3 points
0
on: Why Should I Assume CCP AGI is Worse Than USG AGI?
I really like that I see more discussion of “ok even if we managed to avoid xrisk what then?”, e.g. recent papers on AI-enabled coups and so on. To the point however, I think the problem runs deeper. What I fear the most is that by “Western values imbued in AGI” people mean “we create an everlasting upperclass with no class mobility because capital is everything that matters and we freeze the capital structure, you will get UBI so you should be grateful.”
It probably makes sense to keep the capitalist structure between ASIs but between humans? Seems like a very bad outcome for me (You will live in a pod and you will be happy type of endgame for the masses).