RSS

Eyon Jang

Karma: 88

AI safety researcher; MATS 8.0 scholar

Ex­plo­ra­tion Hack­ing: Can LLMs Learn to Re­sist RL Train­ing?

1 May 2026 20:54 UTC
17 points
0 comments8 min readLW link

Helping Friends, Harm­ing Foes: Test­ing Trib­al­ism in Lan­guage Models

11 Mar 2026 12:06 UTC
10 points
0 comments9 min readLW link

A Con­cep­tual Frame­work for Ex­plo­ra­tion Hacking

12 Feb 2026 16:33 UTC
26 points
2 comments9 min readLW link

Ex­plo­ra­tion hack­ing: can rea­son­ing mod­els sub­vert RL?

30 Jul 2025 22:02 UTC
25 points
4 comments9 min readLW link

Au­tomat­ing AI Safety: What we can do today

25 Jul 2025 14:49 UTC
38 points
0 comments8 min readLW link