Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Jiachen Zhao
Karma:
25
All
Posts
Comments
New
Top
Old
LLMs Encode Harmfulness and Refusal Separately
Jiachen Zhao
22 Jul 2025 18:53 UTC
24
points
5
comments
8
min read
LW
link
(www.arxiv.org)
Back to top