RSS

Henry Sleight

Karma: 254

Re­ward hack­ing be­hav­ior can gen­er­al­ize across tasks

28 May 2024 16:33 UTC
78 points
5 comments21 min readLW link

MATS Win­ter 2023-24 Retrospective

11 May 2024 0:09 UTC
84 points
28 comments49 min readLW link

In­duc­ing Un­prompted Misal­ign­ment in LLMs

19 Apr 2024 20:00 UTC
38 points
6 comments16 min readLW link

How I se­lect al­ign­ment re­search projects

10 Apr 2024 4:33 UTC
35 points
4 comments24 min readLW link

Tem­plates I made to run feed­back rounds for Ethan Perez’s re­search fel­lows.

Henry Sleight28 Mar 2024 19:41 UTC
33 points
0 comments10 min readLW link

Read­ing writ­ing ad­vice doesn’t make writ­ing easier

Henry Sleight7 Feb 2024 19:14 UTC
17 points
0 comments5 min readLW link
(open.substack.com)