Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Bruce W. Lee
Karma:
381
brucewlee.com
All
Posts
Comments
New
Top
Old
Distillation Robustifies Unlearning
Bruce W. Lee
,
Addie Foote
,
alexinf
,
leni
,
Jacob G-W
,
Harish Kamath
,
Bryce Woodworth
,
cloud
and
TurnTrout
13 Jun 2025 13:45 UTC
234
points
43
comments
8
min read
LW
link
(arxiv.org)
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
11 Sep 2024 20:57 UTC
41
points
0
comments
11
min read
LW
link
(brucewlee.com)
Language Models Don’t Learn the Physical Manifestation of Language
Bruce W. Lee
and
Jaehyuk Lim
22 Feb 2024 18:52 UTC
39
points
23
comments
1
min read
LW
link
(arxiv.org)
Back to top