Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Joschka Braun
Karma:
101
All
Posts
Comments
New
Top
Old
Exploration Hacking: Can LLMs Learn to Resist RL Training?
Eyon Jang
,
Joschka Braun
,
Damon Falck
and
David Lindner
1 May 2026 20:54 UTC
18
points
0
comments
8
min read
LW
link
A Conceptual Framework for Exploration Hacking
Joschka Braun
,
Eyon Jang
and
Damon Falck
12 Feb 2026 16:33 UTC
26
points
2
comments
9
min read
LW
link
Exploration hacking: can reasoning models subvert RL?
Damon Falck
,
Joschka Braun
and
Eyon Jang
30 Jul 2025 22:02 UTC
25
points
4
comments
9
min read
LW
link
A Sober Look at Steering Vectors for LLMs
Joschka Braun
,
Dmitrii Krasheninnikov
,
Usman Anwar
,
RobertKirk
,
Daniel Tan
and
David Scott Krueger
23 Nov 2024 17:30 UTC
42
points
0
comments
5
min read
LW
link
Back to top