Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Eyon Jang
Karma:
88
AI safety researcher; MATS 8.0 scholar
All
Posts
Comments
New
Top
Old
Exploration Hacking: Can LLMs Learn to Resist RL Training?
Eyon Jang
,
Joschka Braun
,
Damon Falck
and
David Lindner
1 May 2026 20:54 UTC
17
points
0
comments
8
min read
LW
link
Helping Friends, Harming Foes: Testing Tribalism in Language Models
Irakli Shalibashvili
,
Jer Ren Wong
,
Moksh Nirvaan
,
Diogo Cruz
and
Eyon Jang
11 Mar 2026 12:06 UTC
10
points
0
comments
9
min read
LW
link
A Conceptual Framework for Exploration Hacking
Joschka Braun
,
Eyon Jang
and
Damon Falck
12 Feb 2026 16:33 UTC
26
points
2
comments
9
min read
LW
link
Exploration hacking: can reasoning models subvert RL?
Damon Falck
,
Joschka Braun
and
Eyon Jang
30 Jul 2025 22:02 UTC
25
points
4
comments
9
min read
LW
link
Automating AI Safety: What we can do today
Matthew Shinkle
,
Eyon Jang
and
jacquesthibs
25 Jul 2025 14:49 UTC
38
points
0
comments
8
min read
LW
link
Back to top