RSS

Joschka Braun

Karma: 49

Ex­plo­ra­tion hack­ing: can rea­son­ing mod­els sub­vert RL?

30 Jul 2025 22:02 UTC
16 points
4 comments9 min readLW link

A Sober Look at Steer­ing Vec­tors for LLMs

23 Nov 2024 17:30 UTC
40 points
0 comments5 min readLW link