RSS

LAThomson(Louis Thomson)

Karma: 81

3rd-year undergrad Computer Science and Philosophy student at Oxford, and aspiring AI Safety researcher :)

Towards shut­down­able agents via stochas­tic choice

8 Jul 2024 10:14 UTC
50 points
5 comments23 min readLW link
(arxiv.org)

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

8 Nov 2023 11:37 UTC
49 points
0 comments18 min readLW link