Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
ValueShift Research
Karma:
18
All
Posts
Comments
New
Top
Old
Experiments on Refusal Shape in LLMs
ValueShift Research
2 Apr 2026 12:37 UTC
7
points
0
comments
7
min read
LW
link
Hello, World of Mechanistic Interpetability
ValueShift Research
15 Mar 2026 23:36 UTC
8
points
4
comments
5
min read
LW
link
First steps into mechanistic interpretability. Refusal is not a single direction
ValueShift Research
15 Mar 2026 23:36 UTC
1
point
0
comments
3
min read
LW
link
Back to top