RSS

Euan Ong

Karma: 178

https://​​ong.ac

Build­ing and eval­u­at­ing al­ign­ment au­dit­ing agents

24 Jul 2025 19:22 UTC
47 points
1 comment5 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

13 Mar 2025 19:18 UTC
142 points
15 comments13 min readLW link

Image Hi­jacks: Ad­ver­sar­ial Images can Con­trol Gen­er­a­tive Models at Runtime

20 Sep 2023 15:23 UTC
58 points
9 comments1 min readLW link
(arxiv.org)