RSS

Bruce W. Lee

Karma: 381

brucewlee.com

Distil­la­tion Ro­bus­tifies Unlearning

13 Jun 2025 13:45 UTC
234 points
43 comments8 min readLW link
(arxiv.org)

Pro­gram­ming Re­fusal with Con­di­tional Ac­ti­va­tion Steering

Bruce W. Lee11 Sep 2024 20:57 UTC
41 points
0 comments11 min readLW link
(brucewlee.com)

Lan­guage Models Don’t Learn the Phys­i­cal Man­i­fes­ta­tion of Language

22 Feb 2024 18:52 UTC
39 points
23 comments1 min readLW link
(arxiv.org)