RSS

jacob_drori

Karma: 620

[Paper] Out­put Su­per­vi­sion Can Obfus­cate the CoT

20 Nov 2025 22:41 UTC
72 points
2 comments5 min readLW link
(arxiv.org)

ja­cob_drori’s Shortform

jacob_drori1 Aug 2025 17:47 UTC
7 points
6 comments1 min readLW link

[Re­search Note] Op­ti­miz­ing The Fi­nal Out­put Can Obfus­cate CoT

30 Jul 2025 21:26 UTC
197 points
22 comments6 min readLW link

SAE on ac­ti­va­tion differences

30 Jun 2025 17:50 UTC
44 points
3 comments5 min readLW link