RSS

Julian Stastny

Karma: 174

associate member of technical staff @ redwood research

Misal­ign­ment and Strate­gic Un­der­perfor­mance: An Anal­y­sis of Sand­bag­ging and Ex­plo­ra­tion Hacking

8 May 2025 19:06 UTC
67 points
1 comment15 min readLW link

7+ tractable di­rec­tions in AI control

28 Apr 2025 17:12 UTC
83 points
1 comment13 min readLW link

Disen­tan­gling four mo­ti­va­tions for act­ing in ac­cor­dance with UDT

Julian Stastny5 Nov 2023 21:26 UTC
35 points
3 comments7 min readLW link