RSS

Sam Marks

Karma: 4,727

Eval­u­at­ing hon­esty and lie de­tec­tion tech­niques on a di­verse suite of dishon­est models

25 Nov 2025 19:33 UTC
40 points
0 comments4 min readLW link
(alignment.anthropic.com)

Steer­ing Eval­u­a­tion-Aware Models to Act Like They Are Deployed

30 Oct 2025 15:03 UTC
61 points
12 comments16 min readLW link