RSS

Marie_DB

Karma: 296

Au­to­mated Align­ment is Harder Than You Think

14 May 2026 22:01 UTC
143 points
7 comments3 min readLW link
(arxiv.org)

An al­ign­ment safety case sketch based on debate

8 May 2025 15:02 UTC
62 points
21 comments25 min readLW link
(arxiv.org)

UK AISI’s Align­ment Team: Re­search Agenda

7 May 2025 16:33 UTC
115 points
3 comments11 min readLW link