RSS

lennie

Karma: 21

Which differ­ences be­tween sand­bag­ging eval­u­a­tions and sand­bag­ging safety re­search are im­por­tant for con­trol?

lennie6 Oct 2025 18:20 UTC
1 point
0 comments11 min readLW link

Sand­bag­ging: dis­t­in­guish­ing de­tec­tion of un­der­perfor­mance from in­crim­i­na­tion, and the im­pli­ca­tions for down­stream in­ter­ven­tions.

lennie6 Oct 2025 14:00 UTC
1 point
0 comments8 min readLW link

[Question] Feed­back re­quest: Is the time right for an AI Safety stack ex­change?

lennie26 Sep 2025 9:14 UTC
22 points
0 comments4 min readLW link