RSS

martinkunev

Karma: 69

Dis­in­cen­tiviz­ing de­cep­tion in mesa op­ti­miz­ers with Model Tampering

martinkunev11 Jul 2023 0:44 UTC
3 points
0 comments2 min readLW link

How use­ful is Cor­rigi­bil­ity?

martinkunev12 Sep 2023 0:05 UTC
11 points
4 comments5 min readLW link