Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Dhruv Trehan
Karma:
6
All
Posts
Comments
New
Top
Old
Saying “for AI safety research” made models refuse more on a harmless task
Dhruv Trehan
8 Sep 2025 19:39 UTC
7
points
1
comment
2
min read
LW
link
(lossfunk.substack.com)
Back to top