RSS

Scal­able Oversight

TagLast edit: 18 Apr 2024 19:57 UTC by Raemon

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam Marks18 Apr 2024 16:17 UTC
101 points
7 comments12 min readLW link
No comments.