RSS

dylan_f

Karma: 204

Weird Gen­er­al­iza­tion & In­duc­tive Backdoors

11 Dec 2025 18:18 UTC
152 points
8 comments8 min readLW link

Con­cept Poi­son­ing: Prob­ing LLMs with­out probes

5 Aug 2025 17:00 UTC
60 points
5 comments13 min readLW link