RSS

Elliott Thornley (EJT)

Karma: 1,115

elliott-thornley.com

Towards Shut­down­able Agents: Gen­er­al­iz­ing Stochas­tic Choice in RL Agents and LLMs

3 Jun 2026 14:24 UTC
20 points
3 comments19 min readLW link
(arxiv.org)

Maybe we should pre­train on syn­thetic data about good-but-re­ward-hack­ing AIs

Elliott Thornley (EJT)29 May 2026 14:50 UTC
12 points
4 comments3 min readLW link

AI safety can be a Pas­cal’s mug­ging even if p(doom) is high

Elliott Thornley (EJT)25 Apr 2026 16:16 UTC
29 points
10 comments1 min readLW link