RSS

Bruce W. Lee

Karma: 521

Pre­limi­nary Ex­plo­ra­tions on La­tent Side Task Uplift

Bruce W. Lee2 Apr 2026 2:23 UTC
13 points
0 comments4 min readLW link

Rea­son­ing Models Strug­gle to Con­trol Their Chains of Thought

5 Mar 2026 22:37 UTC
76 points
9 comments3 min readLW link

Salient Direc­tions in AI Control

Bruce W. Lee5 Mar 2026 19:38 UTC
13 points
0 comments14 min readLW link
(brucewlee.com)

Train­ing Agents to Self-Re­port Misbehavior

25 Feb 2026 17:50 UTC
26 points
0 comments8 min readLW link

Bruce W. Lee’s Shortform

Bruce W. Lee19 Feb 2026 1:53 UTC
4 points
2 comments1 min readLW link

Bit­ter Les­sons from Distil­la­tion Ro­bus­tifies Unlearning

Bruce W. Lee28 Nov 2025 1:31 UTC
27 points
3 comments7 min readLW link
(www.lesswrong.com)

Distil­la­tion Ro­bus­tifies Unlearning

13 Jun 2025 13:45 UTC
239 points
43 comments8 min readLW link
(arxiv.org)

Pro­gram­ming Re­fusal with Con­di­tional Ac­ti­va­tion Steering

Bruce W. Lee11 Sep 2024 20:57 UTC
41 points
0 comments11 min readLW link
(brucewlee.com)