Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Scalable Oversight
Tag
Last edit:
18 Apr 2024 19:57 UTC
by
Raemon
Relevant
New
Old
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
18 Apr 2024 16:17 UTC
102
points
7
comments
12
min read
LW
link
NYU Code Debates Update/Postmortem
David Rein
24 May 2024 16:08 UTC
15
points
4
comments
10
min read
LW
link
No comments.
Back to top