Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
oliverfm
Karma:
54
I work on Control at UK AISI
All
Posts
Comments
New
Top
Old
Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we’re studying them anyway
charlie_griffin
,
ollie
,
oliverfm
,
Rogan Inglis
and
Alan Cooney
15 Aug 2025 11:48 UTC
61
points
3
comments
17
min read
LW
link
Back to top