Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Joschka Braun
Karma:
81
All
Posts
Comments
New
Top
Old
A Conceptual Framework for Exploration Hacking
Joschka Braun
,
Eyon Jang
and
Damon Falck
12 Feb 2026 16:33 UTC
25
points
2
comments
9
min read
LW
link
Exploration hacking: can reasoning models subvert RL?
Damon Falck
,
Joschka Braun
and
Eyon Jang
30 Jul 2025 22:02 UTC
22
points
4
comments
9
min read
LW
link
A Sober Look at Steering Vectors for LLMs
Joschka Braun
,
Dmitrii Krasheninnikov
,
Usman Anwar
,
RobertKirk
,
Daniel Tan
and
David Scott Krueger (formerly: capybaralet)
23 Nov 2024 17:30 UTC
40
points
0
comments
5
min read
LW
link
Back to top