RSS

Euan Ong

Karma: 313

https://​​ong.ac

Ac­ti­va­tion Or­a­cles: Train­ing and Eval­u­at­ing LLMs as Gen­eral-Pur­pose Ac­ti­va­tion Explainers

18 Dec 2025 20:21 UTC
153 points
11 comments8 min readLW link
(arxiv.org)

Build­ing and eval­u­at­ing al­ign­ment au­dit­ing agents

24 Jul 2025 19:22 UTC
47 points
1 comment5 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

13 Mar 2025 19:18 UTC
145 points
15 comments13 min readLW link

Image Hi­jacks: Ad­ver­sar­ial Images can Con­trol Gen­er­a­tive Models at Runtime

20 Sep 2023 15:23 UTC
58 points
9 comments1 min readLW link
(arxiv.org)