RSS

Scal­able Oversight

TagLast edit: 18 Apr 2024 19:57 UTC by Raemon

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam Marks18 Apr 2024 16:17 UTC
102 points
7 comments12 min readLW link

NYU Code De­bates Up­date/​Postmortem

David Rein24 May 2024 16:08 UTC
15 points
4 comments10 min readLW link
No comments.