Euan Ong

Karma: 525

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

Subhash Kantamneni, kitft, Euan Ong and Sam Marks

7 May 2026 20:21 UTC

215 points

35 comments8 min readLW link

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas and Owain_Evans

18 Dec 2025 20:21 UTC

154 points

12 comments8 min readLW link

(arxiv.org)

Building and evaluating alignment auditing agents

Sam Marks, trentbrick, RowanWang, Sam Bowman, Euan Ong, Johannes Treutlein and evhub

24 Jul 2025 19:22 UTC

47 points

1 comment5 min readLW link

Auditing language models for hidden objectives

Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei Nishimura-Gasparian, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M and evhub

13 Mar 2025 19:18 UTC

155 points

15 comments13 min readLW link

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Scott Emmons, Luke Bailey and Euan Ong

20 Sep 2023 15:23 UTC

58 points

9 comments1 min readLW link

(arxiv.org)