Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Scalable Oversight
Tag
Last edit:
18 Apr 2024 19:57 UTC
by
Raemon
Relevant
New
Old
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
18 Apr 2024 16:17 UTC
101
points
7
comments
12
min read
LW
link
No comments.
Back to top