Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Rogan Inglis
Karma:
65
All
Posts
Comments
New
Top
Old
Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we’re studying them anyway
charlie_griffin
,
ollie
,
oliverfm
,
Rogan Inglis
and
Alan Cooney
15 Aug 2025 11:48 UTC
61
points
3
comments
17
min read
LW
link
Sparse Features Through Time
Rogan Inglis
24 Jun 2024 18:06 UTC
12
points
1
comment
1
min read
LW
link
(roganinglis.io)
Back to top