RSS

cloud

Karma: 1,434

Ap­ply for Align­ment Men­tor­ship from TurnTrout and Alex Cloud

26 Dec 2025 17:20 UTC
43 points
0 comments2 min readLW link
(turntrout.com)

[Paper] Out­put Su­per­vi­sion Can Obfus­cate the CoT

20 Nov 2025 22:41 UTC
75 points
3 comments5 min readLW link
(arxiv.org)

Om­nis­cal­ing to MNIST

cloud8 Nov 2025 19:42 UTC
87 points
3 comments10 min readLW link

Re­con­tex­tu­al­iza­tion Miti­gates Speci­fi­ca­tion Gam­ing Without Mod­ify­ing the Specification

14 Oct 2025 0:53 UTC
130 points
15 comments11 min readLW link

cloud’s Shortform

cloud17 Sep 2025 7:41 UTC
6 points
1 comment1 min readLW link

Nar­row fine­tun­ing is different

5 Aug 2025 14:29 UTC
68 points
3 comments4 min readLW link

[Re­search Note] Op­ti­miz­ing The Fi­nal Out­put Can Obfus­cate CoT

30 Jul 2025 21:26 UTC
199 points
23 comments6 min readLW link

Sublimi­nal Learn­ing: LLMs Trans­mit Be­hav­ioral Traits via Hid­den Sig­nals in Data

22 Jul 2025 16:37 UTC
343 points
39 comments4 min readLW link

Selec­tive Gen­er­al­iza­tion: Im­prov­ing Ca­pa­bil­ities While Main­tain­ing Alignment

16 Jul 2025 21:25 UTC
67 points
4 comments7 min readLW link