Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Methuselah
Karma:
17
All
Posts
Comments
New
Top
Old
Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms
Methuselah
8 Apr 2026 21:36 UTC
16
points
6
comments
2
min read
LW
link
Back to top