Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Joschka Braun
Karma:
49
All
Posts
Comments
New
Top
Old
Exploration hacking: can reasoning models subvert RL?
Damon Falck
,
Joschka Braun
and
Eyon Jang
30 Jul 2025 22:02 UTC
16
points
4
comments
9
min read
LW
link
A Sober Look at Steering Vectors for LLMs
Joschka Braun
,
Dmitrii Krasheninnikov
,
Usman Anwar
,
RobertKirk
,
Daniel Tan
and
David Scott Krueger (formerly: capybaralet)
23 Nov 2024 17:30 UTC
40
points
0
comments
5
min read
LW
link
Back to top