Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Le magicien quantique
Karma:
10
:eyes:
All
Posts
Comments
New
Top
Old
Exploring the multi-dimensional refusal subspace in reasoning models
Le magicien quantique
27 Oct 2025 9:03 UTC
5
points
2
comments
4
min read
LW
link
Subspace Rerouting: Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Le magicien quantique
18 Mar 2025 17:55 UTC
6
points
1
comment
10
min read
LW
link
Back to top