RSS

Elliott Thornley

Karma: 1,198

elliott-thornley.com

Risk-Averse AIs

24 Jun 2026 11:35 UTC
35 points
26 comments5 min readLW link
(www.forethought.org)

Towards Shut­down­able Agents: Gen­er­al­iz­ing Stochas­tic Choice in RL Agents and LLMs

3 Jun 2026 14:24 UTC
20 points
3 comments19 min readLW link
(arxiv.org)

Maybe we should pre­train on syn­thetic data about good-but-re­ward-hack­ing AIs

Elliott Thornley29 May 2026 14:50 UTC
12 points
4 comments3 min readLW link