RSS

cloud

Karma: 1,234

Re­con­tex­tu­al­iza­tion Miti­gates Speci­fi­ca­tion Gam­ing Without Mod­ify­ing the Specification

14 Oct 2025 0:53 UTC
109 points
11 comments9 min readLW link

cloud’s Shortform

cloud17 Sep 2025 7:41 UTC
6 points
1 comment1 min readLW link

Nar­row fine­tun­ing is different

5 Aug 2025 14:29 UTC
66 points
3 comments4 min readLW link

[Re­search Note] Op­ti­miz­ing The Fi­nal Out­put Can Obfus­cate CoT

30 Jul 2025 21:26 UTC
196 points
22 comments6 min readLW link

Sublimi­nal Learn­ing: LLMs Trans­mit Be­hav­ioral Traits via Hid­den Sig­nals in Data

22 Jul 2025 16:37 UTC
340 points
39 comments4 min readLW link

Selec­tive Gen­er­al­iza­tion: Im­prov­ing Ca­pa­bil­ities While Main­tain­ing Alignment

16 Jul 2025 21:25 UTC
67 points
4 comments7 min readLW link

Distil­la­tion Ro­bus­tifies Unlearning

13 Jun 2025 13:45 UTC
234 points
43 comments8 min readLW link
(arxiv.org)

Selec­tive mod­u­lar­ity: a re­search agenda

24 Mar 2025 4:12 UTC
66 points
2 comments24 min readLW link

[Question] Is weak-to-strong gen­er­al­iza­tion an al­ign­ment tech­nique?

cloud31 Jan 2025 7:13 UTC
22 points
1 comment2 min readLW link

Gra­di­ent Rout­ing: Mask­ing Gra­di­ents to Lo­cal­ize Com­pu­ta­tion in Neu­ral Networks

6 Dec 2024 22:19 UTC
169 points
14 comments11 min readLW link
(arxiv.org)